| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scan URL:
https://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2017-11-10-0f524f07/html/
ID: 9 (BAD_SHIFT)
ID: 58 (CHECKED_RETURN)
ID: 98 (DEAD_CODE)
ID: 249, 250, 251, 252 (MIXED_ENUMS)
ID: 289, 297 (NULL_RETURNS)
ID: 609, 613, 622, 644, 653, 655 (UNUSED_VALUE)
ID: 432 (RESOURCE_LEAK)
Change-Id: I2349877214dd38b789e08b74be05539f09b751b9
BUG: 789278
Signed-off-by: Milind Changire <mchangir@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Today the main users of client uuid are protocol layers, locks, leases.
Protocol layers requires each client uuid to be unique, even across
connects and disconnects. Locks and leases on the server side also use
the same client uid which changes across file migrations. Which makes the graph
switch and file migration tedious for locks and leases.
file migration across bricks becomes difficult as client uuid for the same
client, is different on the other brick.
The exact set of issues exists for leases as well.
Solution would be to introduce a constant in the client-uid string which
the locks and leases can use to identify the owner client across bricks.
Client uuid currently:
%s(ctx uuid)-%s(protocol client name)-%d(graph id)%s(setvolume count/reconnect count)
Proposed Client uuid:
"CTX_ID:%s-GRAPH_ID:%d-PID:%d-HOST:%s-PC_NAME:%s-RECON_NO:%s"
- CTX_ID: This is will be constant per client.
- GRAPH_ID, PID, HOST, PC_NAME(protocol client name), RECON_NO(setvolume count)
remains the same.
Change-Id: Ia81d57a9693207cd325d7b26aee4593fcbd6482c
BUG: 1369028
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
* xdr: add gfid to on wire format for fsetattr/rchecksum
* as it is change in on wire XDR format, needed backward
compatible RPC programs.
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 827334
Change-Id: Id0a2da3632516dc1a5560dde2b151b2e5f0be8e5
|
|
|
|
|
|
|
|
|
|
| |
Ensure that the fop program is the first in the program list
so that there's minimum amount of time spent to search the
program for the most frequently needed use case.
Change-Id: I45c3dcdbf39ec90ba39d914432d13a2ace00a5ee
BUG: 1509647
Signed-off-by: Milind Changire <mchangir@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
For achieving the above, needed below changes too.
* more sanity into how 'frame->op' is assigned.
* infra to have 'stats' as separate section in 'xlator_t' structure
Updates #137
Change-Id: I36679bf9577f3ed00a695b4e7d92870dcb3db8e1
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
On a service request, the actor is searched using an exclusive mutex
lock which is not really necessary since most of the time the actor
list is going to be searched and not modified.
Solution:
Use a read-write lock instead of a mutex lock.
Only modify operations on a service need to be done under a write-lock
which grants exclusive access to the code.
Change-Id: Ia227351b3f794bd8eee70c7a76d833cc716ab113
BUG: 1509644
Signed-off-by: Milind Changire <mchangir@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- This diff changes all locations in the code to prefer inet6 family
instead of inet. This will allow change GlusterFS to operate
via IPv6 instead of IPv4 for all internal operations while still
being able to serve (FUSE or NFS) clients via IPv4.
- The changes apply to NFS as well.
- This diff ports D1892990, D1897341 & D1896522 to the 3.8 branch.
Test Plan: Prove tests!
Reviewers: dph, rwareing
Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
Change-Id: I34fdaaeb33c194782255625e00616faf75d60c33
BUG: 1406898
Reviewed-on-3.8-fb: http://review.gluster.org/16059
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
Tested-by: Shreyas Siravara <sshreyas@fb.com>
|
|
|
|
|
|
| |
Change-Id: Idf99908aa48718a7faf7f0bbc647a679ec548282
BUG: 1443145
Signed-off-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
| |
Change-Id: I8c6f6b642f025d1faf74015b8f7aaecd7ebfd4d5
BUG: 1443145
Signed-off-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
entries to be healed
Command output:
Brick 192.168.2.8:/brick/1
Status: Connected
Total Number of entries: 363
Number of entries in heal pending: 362
Number of entries in split-brain: 0
Number of entries possibly healing: 1
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
<healInfo>
<bricks>
<brick hostUuid="9105dd4b-eca8-4fdb-85b2-b81cdf77eda3">
<name>192.168.2.8:/brick/1</name>
<status>Connected</status>
<totalNumberOfEntries>363</numberOfEntries>
<numberOfEntriesInHealPending>362</numberOfEntriesInHealPending>
<numberOfEntriesInSplitBrain>0</numberOfEntriesInSplitBrain>
<numberOfEntriesPossiblyHealing>1</numberOfEntriesPossiblyHealing>
</brick>
</bricks>
</healInfo>
<opRet>0</opRet>
<opErrno>0</opErrno>
<opErrstr/>
</cliOutput>
Change-Id: I40cb6f77a14131c9e41b292f4901b41a228863d7
BUG: 1261463
Signed-off-by: Mohamed Ashiq Liyazudeen <mliyazud@redhat.com>
Reviewed-on: https://review.gluster.org/12154
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Karthik U S <ksubrahm@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
1. Ref counting increment on the client_t object is done in
rpcsvc_request_init() which is incorrect.
2. Ref not taken when delegating to grace_time_handler()
Solution:
1. Only fop requests which require processing down the graph via
stack 'frames' now ref count the request in get_frame_from_request()
2. Take ref on client_t object in server_rpc_notify() but avoid
dropping in RPCSVC_EVENT_TRANSPORT_DESRTROY. Drop the ref
unconditionally when exiting out of grace_time_handler().
Also, avoid dropping ref on client_t in
RPCSVC_EVENT_TRANSPORT_DESTROY when ref mangement as been
delegated to grace_time_handler()
Change-Id: Ic16246bebc7ea4490545b26564658f4b081675e4
BUG: 1481600
Reported-by: Raghavendra G <rgowdapp@redhat.com>
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: https://review.gluster.org/17982
Tested-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 5b14c11d3cae38bc66006b02217ede485ae30dea.
BUG: 1484225
Change-Id: I3269d3fc64de3f3cc6f670ea564a87d7725e10fd
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: https://review.gluster.org/18113
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Tested-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
glusterd rpc code path attempts to reconnect to rebalance process
via the reconnect timer even after the rebalance process disconnection
Solution:
Set the clnt->disabled flag to 1 to avoid reconnection and cause
the clnt object to be unref'd
Change-Id: I4e38eaef45d2fdea86d25e9dff9f1af0cd29cf66
BUG: 1484225
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: https://review.gluster.org/18093
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PROBLEM: Both attach tier and add brick have the same RPC
and set of code. This becomes a hurdle while tring to implement
add brick on a tiered volume.
FIX: This patch separates the add brick and attach tier
giving them separate RPCs.
Change-Id: Iec57e972be968a9ff00b15b507e56a4f6dc398a2
BUG: 1376326
Signed-off-by: hari gowtham <hgowtham@redhat.com>
Reviewed-on: https://review.gluster.org/15503
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: hari gowtham <hari.gowtham005@gmail.com>
Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following log entry is observed during volume operations such
as volume create:
[2017-07-20 05:13:43.213797] E [client_t.c:321:gf_client_ref]
(-->/usr/local/lib/libgfrpc.so.0(rpcsvc_request_create+0x1a4) [0x7f987f66cd20]
-->/usr/local/lib/libgfrpc.so.0(rpcsvc_request_init+0xd0) [0x7f987f66ca23]
-->/usr/local/lib/libglusterfs.so.0(gf_client_ref+0x56) [0x7f987f91cbd5] )
0-client_t: null client [Invalid argument]
Change-Id: I49ba753e8d1a828bb275b0ccb1a181706774f387
BUG: 1193929
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Reviewed-on: https://review.gluster.org/17848
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Amar Tumballi <amarts@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
frames which time out at current second are missed out
Solution:
change test to include frames timing out at current second
i.e. timeout <= current.tv_sec
instead of
timeout < current.tv_sec
Change-Id: I459d47856ade2b657a0289e49f7f63da29186d6e
BUG: 1468433
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: https://review.gluster.org/17722
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Set names to threads on creation for easier
debugging.
Output of top -H -p <PID-OF-GLUSTERFSD>
Before:
19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
After:
19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustertimer
19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustermemsweep
19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc0
19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc1
19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll0
19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteridxwrker
19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteriotwr0
19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrssign
19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrswrker
19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterclogecon
19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd0
19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd1
19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd2
19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixjan
19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixfsy
25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll1
5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll2
7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixhc
Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703
BUG: 1254002
Updates: #271
Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-on: https://review.gluster.org/11926
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Program
Since poller thread bears the brunt of execution till the request is
handed over to io-threads, poller thread experiencies lock
contention(s) in the control flow till io-threads, which slows it
down. This delay invariably affects reading ping requests from network
and responding to them, resulting in increased ping latencies, which
sometimes results in a ping-timer-expiry on client leading to
disconnect of transport. So, this patch aims to free up poller thread
from executing code of Glusterfs Program. We do this by making
* Glusterfs Program registering itself asking rpcsvc to execute its
actors in its own threads.
* GF-DUMP Program registering itself asking rpcsvc to _NOT_ execute
its actors in its own threads. Otherwise program's ownthreads become
bottleneck in processing ping traffic. This means that poller thread
reads a ping packet, invokes its actor and hands the response msg to
transport queue.
Change-Id: I526268c10bdd5ef93f322a4f95385137550a6a49
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
BUG: 1421938
Reviewed-on: https://review.gluster.org/17105
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When external programs perform a dlopen("..so", RTLD_LAZY|RTLD_LOCAL)
on some shared objects like xlators, it can fail with dlerror set to
error string "undefined symbol <some-type>".
This was observed for the following shared objects: fuse.so, quota.so,
quotad.so, server.so, libgfrpc.so and socket.so
P.S: This was found while running a go program which fetches the list
of xlator options (volume_option_t) from xlator's shared object.
BUG: 1193929
Change-Id: I7b958409cf11fb67c2be32a3f85a96fb1260236b
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Reviewed-on: https://review.gluster.org/17659
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The most common pattern, both in our code and elsewhere, is this:
struct _xyz {
...
};
typedef struct _xyz xyz_t;
These exceptions - especially call_frame/call_stack - have been slowing
down code navigation for years. By converging on a single pattern,
navigating from xyz_t in code to the actual definition of struct _xyz
(i.e. without having to visit the typedef first) might even be
automatable.
Change-Id: I0e5dd1f51f98e000173c62ef4ddc5b21d9ec44ed
Signed-off-by: Jeff Darcy <jdarcy@fb.com>
Reviewed-on: https://review.gluster.org/17650
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Jeff Darcy <jeff@pl.atyp.us>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I57ad970411db1ccd3d2c56c504c7da9cc221051f
BUG: 1448692
Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com>
Reviewed-on: https://review.gluster.org/17198
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They snuck in with the HALO patch (07cc8679c)
Change-Id: I8ced6cbb0b49554fc9d348c453d4d5da00f981f6
BUG: 1447953
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: https://review.gluster.org/17174
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Halo Geo-replication is a feature which allows Gluster or NFS clients to write
locally to their region (as defined by a latency "halo" or threshold if you
like), and have their writes asynchronously propagate from their origin to the
rest of the cluster. Clients can also write synchronously to the cluster
simply by specifying a halo-latency which is very large (e.g. 10seconds) which
will include all bricks.
In other words, it allows clients to decide at mount time if they desire
synchronous or asynchronous IO into a cluster and the cluster can support both
of these modes to any number of clients simultaneously.
There are a few new volume options due to this feature:
halo-shd-latency: The threshold below which self-heal daemons will
consider children (bricks) connected.
halo-nfsd-latency: The threshold below which NFS daemons will consider
children (bricks) connected.
halo-latency: The threshold below which all other clients will
consider children (bricks) connected.
halo-min-replicas: The minimum number of replicas which are to
be enforced regardless of latency specified in the above 3 options.
If the number of children falls below this threshold the next
best (chosen by latency) shall be swapped in.
New FUSE mount options:
halo-latency & halo-min-replicas: As descripted above.
This feature combined with multi-threaded SHD support (D1271745) results in
some pretty cool geo-replication possibilities.
Operational Notes:
- Global consistency is gaurenteed for synchronous clients, this is provided by
the existing entry-locking mechanism.
- Asynchronous clients on the other hand and merely consistent to their region.
Writes & deletes will be protected via entry-locks as usual preventing
concurrent writes into files which are undergoing replication. Read operations
on the other hand should never block.
- Writes are allowed from _any_ region and propagated from the origin to all
other regions. The take away from this is care should be taken to ensure
multiple writers do not write the same files resulting in a gfid split-brain
which will require resolution via split-brain policies (majority, mtime &
size). Recommended method for preventing this is using the nfs-auth feature to
define which region for each share has RW permissions, tiers not in the origin
region should have RO perms.
TODO:
- Synchronous clients (including the SHD) should choose clients from their own
region as preferred sources for reads. Most of the plumbing is in place for
this via the child_latency array.
- Better GFID split brain handling & better dent type split brain handling
(i.e. create a trash can and move the offending files into it).
- Tagging in addition to latency as a means of defining which children you wish
to synchronously write to
Test Plan:
- The usual suspects, clang, gcc w/ address sanitizer & valgrind
- Prove tests
Reviewers: jackl, dph, cjh, meyering
Reviewed By: meyering
Subscribers: ethanr
Differential Revision: https://phabricator.fb.com/D1272053
Tasks: 4117827
Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
BUG: 1428061
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16099
Reviewed-on: https://review.gluster.org/16177
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
The following line of code intended to remove pmap entry for the
connection during disconnects:
pmap_registry_remove (this, 0, NULL, GF_PMAP_PORT_NONE, xprt);
However, no pmap entry will have it's type set to GF_PMAP_PORT_NONE
at any point in time. So a call to pmap_registry_search_by_xprt() in
pmap_registry_remove() will always fail to find a match.
Fix:
Optionally ignore pmap entry's type in pmap_registry_search_by_xprt().
BUG: 1193929
Change-Id: I705f101739ab1647ff52a92820d478354407264a
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Reviewed-on: https://review.gluster.org/17129
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem :
- Deploy gluster on 2 nodes, one brick each, one volume replicated
- Create a snapshot
- Lose one server
- Add a replacement peer and new brick with a new IP address
- replace-brick the missing brick onto the new server
(wait for replication to finish)
- peer detach the old server
- after doing above steps, glusterd fails to restart.
Solution:
With the fix detach peer will populate an error : "N2 is part of
existing snapshots. Remove those snapshots before proceeding".
While doing so we force user to stay with that peer or to delete
all snapshots.
Change-Id: I3699afb9b2a5f915768b77f885e783bd9b51818c
BUG: 1322145
Signed-off-by: Gaurav Yadav <gyadav@redhat.com>
Reviewed-on: https://review.gluster.org/16907
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 086436a introduced generation number (cleanup_gen) to ensure that
rpc layer doesn't end up cleaning up the connection object if
application layer has already destroyed it. Bumping up cleanup_gen was
done only in rpc_clnt_connection_cleanup (). However the same is needed
in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed
through the reconnect event in the application layer, rpc layer will
still end up in trying to delete the object resulting into double free
and crash.
Peer probing an invalid host/IP was the basic test to catch this issue.
Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
BUG: 1433578
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-on: https://review.gluster.org/16914
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Milind Changire <mchangir@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid logging Success in the event of failure especially when errno has
no meaningful value w.r.t. the failure. In this case the errno is set to
zero when there's indeed a failure at the RPC level.
Change-Id: If2cc81aa1e590023ed22892dacbef7cac213e591
BUG: 1426032
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: https://review.gluster.org/16730
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Locking during notify was introduced as part of commit
aa22f24f5db7659387704998ae01520708869873 [1]. The fix was introduced
to fix out-of-order CONNECT/DISCONNECT events from rpc-clnt to parent
xlators [2]. However as part of handling DISCONNECT protocol/client
does unwind saved frames (with failure) waiting for responses. This
saved_frames_unwind can be a costly operation and hence ideally
shouldn't be included in the critical section of notifylock, as it
unnecessarily delays the reconnection to same brick. Also, its not a
good practise to pass control to other xlators holding a lock as it
can lead to deadlocks. So, this patch removes locking in rpc-clnt
while notifying parent xlators.
To fix [2], two changes are present in this patch:
* notify DISCONNECT before cleaning up rpc connection (same as commit
a6b63e11b7758cf1bfcb6798, patch [3]).
* protocol/client uses rpc_clnt_cleanup_and_start, which cleans up rpc
connection and does a start while handling a DISCONNECT event from
rpc. Note that patch [3] was reverted as rpc_clnt_start called in
quick_reconnect path of protocol/client didn't invoke connect on
transport as the connection was not cleaned up _yet_ (as cleanup was
moved post notification in rpc-clnt). This resulted in clients never
attempting connect to bricks.
Note that one of the neater ways to fix [2] (without using locks) is
to introduce generation numbers to map CONNECT and DISCONNECTS across
epochs and ignore DISCONNECT events if they don't belong to current
epoch. However, this approach is a bit complex to implement and
requires time. So, current patch is a hacky stop-gap fix till we come
up with a more cleaner solution.
[1] http://review.gluster.org/15916
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1386626
[3] http://review.gluster.org/15681
Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
BUG: 1427012
Reviewed-on: https://review.gluster.org/16784
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Milind Changire <mchangir@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since strcmp return a int, and since the spec
of strcmp do not tell the return value, it
could return 256 and this would overflow.
Found by Coverity scan.
(thanks to Stéphane Marcheusin who explained
the details to me)
Change-Id: I5195e05b44f8b537226e6cee178d95a1ab904e96
BUG: 789278
Signed-off-by: Michael Scherer <misc@redhat.com>
Reviewed-on: https://review.gluster.org/16738
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Tested-by: Michael Scherer <misc@fedoraproject.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I003e38b238704d3345d46688355bcf3702455ba1
BUG: 1399593
Signed-off-by: Mateusz Slupny <mateusz.slupny@appeartv.com>
[ndevos: rebased after I8ff5d1a32 moved the code around]
Reviewed-on: https://review.gluster.org/15969
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Niels de Vos <ndevos@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue:
When fio is run on multiple clients (each client writes to its own files),
and meanwhile the clients does a readdirp, thus the client which did
a readdirp will now recieve the upcalls. In this scenario the client
disconnects with rpc decode failed error.
RCA:
Upcall calls rpcsvc_request_submit to submit the request to socket:
rpcsvc_request_submit currently:
rpcsvc_request_submit () {
iobuf = iobuf_new
iov = iobuf->ptr
fill iobuf to contain xdrised upcall content - proghdr
rpcsvc_callback_submit (..iov..)
...
if (iobuf)
iobuf_unref (iobuf)
}
rpcsvc_callback_submit (... iov...) {
...
iobuf = iobuf_new
iov1 = iobuf->ptr
fill iobuf to contain xdrised rpc header - rpchdr
msg.rpchdr = iov1
msg.proghdr = iov
...
rpc_transport_submit_request (msg)
...
if (iobuf)
iobuf_unref (iobuf)
}
rpcsvc_callback_submit assumes that once rpc_transport_submit_request()
returns the msg is written on to socket and thus the buffers(rpchdr, proghdr)
can be freed, which is not the case. In especially high workload,
rpc_transport_submit_request() may not be able to write to socket immediately
and hence adds it to its own queue and returns as successful. Thus, we have
use after free, for rpchdr and proghdr. Hence the clients gets garbage rpchdr
and proghdr and thus fails to decode the rpc, resulting in disconnect.
To prevent this, we need to add the rpchdr and proghdr to a iobref and send
it in msg:
iobref_add (iobref, iobufs)
msg.iobref = iobref;
The socket layer takes a ref on msg.iobref, if it cannot write to socket and
is adding to the queue. Thus we do not have use after free.
Thank You for discussing, debugging and fixing along:
Prashanth Pai <ppai@redhat.com>
Raghavendra G <rgowdapp@redhat.com>
Rajesh Joseph <rjoseph@redhat.com>
Kotresh HR <khiremat@redhat.com>
Mohammed Rafi KC <rkavunga@redhat.com>
Soumya Koduri <skoduri@redhat.com>
Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275
BUG: 1421937
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: https://review.gluster.org/16613
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: soumya k <skoduri@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for multiple brick translator stacks running
in a single brick server process. This reduces our per-brick memory usage by
approximately 3x, and our appetite for TCP ports even more. It also creates
potential to avoid process/thread thrashing, and to improve QoS by scheduling
more carefully across the bricks, but realizing that potential will require
further work.
Multiplexing is controlled by the "cluster.brick-multiplex" global option. By
default it's off, and bricks are started in separate processes as before. If
multiplexing is enabled, then *compatible* bricks (mostly those with the same
transport options) will be started in the same process.
Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb
BUG: 1385758
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: https://review.gluster.org/14763
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this, we will be able to trigger statedumps on remote Gluster
clients, mainly targetted for applications using libgfapi.
Design:
SIGUSR signal is the most comman way of taking a statedump in Gluster.
But it cannot be used for libgfapi based processes, as the process
loading the library might have already consumed SIGUSR signal. Hence
going by the command way.
One has to issue a Gluster command to initiate a statedump on the
libgfapi based client. The command takes hostname and PID as an
argument. All the glusterds in the cluster, check if they are connected
to the specified hostname, and send an RPC request to all the connected
clients from that hostname (via the mgmt connection).
URL: http://review.gluster.org/16357
Change-Id: Icbe4d2f026b32a2c7d5535e1bfb2cdaaff042e91
BUG: 1169302
Signed-off-by: Poornima G <pgurusid@redhat.com>
[ndevos: minor fixes and split patch in smaller pieces]
Reviewed-on: https://review.gluster.org/9228
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Niels de Vos <ndevos@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tierd is implemented by separating from rebalance process.
The commands affected:
1) Attach tier will trigger this process instead of old one
2) tier start and tier start force will also trigger this process.
3) volume status [tier] will show tier daemon as a process instead
of task and normal tier status and tier detach status works.
4) tier stop implemented.
5) detach tier implemented separately along with new detach tier
status
6) volume tier volname status will work using the changes.
7) volume set works
This patch has separated the tier translator from the legacy
DHT rebalance code. It now sends the RPCs from the CLI
to glusterd separate to the DHT rebalance code.
The daemon is now a service, similar to the snapshot daemon,
and can be viewed using the volume status command.
The code for the validation and commit phase are the same
as the earlier tier validation code in DHT rebalance.
The “brickop” phase has been changed so that the status
command can use this framework.
The service management framework is now used.
DHT rebalance does not use this framework.
This service framework takes care of :
*) spawning the daemon, killing it and other such processes.
*) volume set options , which are written on the volfile.
*) restart and reconfigure functions. Restart is to restart
the daemon at two points
1)after gluster goes down and comes up.
2) to stop detach tier.
*) reconfigure is used to make immediate volfile changes.
By doing this, we don’t restart the daemon.
it has the code to rewrite the volfile for topological
changes too (which comes into place during add and remove brick).
With this patch the log, pid, and volfile are separated
and put into respective directories.
Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399
BUG: 1313838
Signed-off-by: hari gowtham <hgowtham@redhat.com>
Reviewed-on: http://review.gluster.org/13365
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: hari gowtham <hari.gowtham005@gmail.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When SSL is enabled or if "transport.socket.own-thread" option is set
then socket_poller is run as different thread. Currently during
disconnect or PARENT_DOWN scenario we don't wait for this thread
to terminate. PARENT_DOWN will disconnect the socket layer and
cleanup resources used by socket_poller.
Therefore before disconnect we should wait for poller thread to exit.
Change-Id: I71f984b47d260ffd979102f180a99a0bed29f0d6
BUG: 1404181
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-on: http://review.gluster.org/16141
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kaushal M <kaushal@redhat.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible that the notification thread which notifies
protocol/client layer about the disconnection is put to sleep
and meanwhile, a fuse thread or a timer thread initiates and
completes reconnection to the brick. The notification thread
is then woken up and protocol/client layer updates its flags
to indicate that network is disconnected. No reconnection is
initiated because reconnection is rpc-lib layer's responsibility
and its flags indicate that connection is connected.
Fix: Serialize connect and disconnect notify
Credit: Raghavendra Talur <rtalur@redhat.com>
Change-Id: I8ff5d1a3283b47f5c26848a42016a40bc34ffc1d
BUG: 1386626
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-on: http://review.gluster.org/15916
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When there are already existing non-granular indices created that are
yet to be healed, if granular-entry-heal option is toggled from 'off' to
'on', AFR self-heal whenever it kicks in, will try to look for granular
indices in 'entry-changes'. Because of the absence of name indices,
granular entry healing logic will fail to heal these directories, and
worse yet unset pending extended attributes with the assumption that
are no entries that need heal.
To get around this, a new CLI is introduced which will invoke glfsheal
program to figure whether at the time an attempt is made to enable
granular entry heal, there are pending heals on the volume OR there
are one or more bricks that are down. If either of them is true, the
command will be failed with the appropriate error.
New CLI: gluster volume heal <VOL> granular-entry-heal {enable,disable}
Change-Id: I1f4fe8162813b9068e198965d94169fee4adc099
BUG: 1370410
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/15747
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit a6b63e11b7758cf1bfcb67985e25ec02845f0995.
Nithya and Rajesh found that the mount fails sometimes after this patch
was merged so reverting it.
BUG: 1386626
Change-Id: I959a5b6c7da61368cf4c67c98193c6e8fdd1755d
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/15838
Reviewed-by: N Balachandran <nbalacha@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
There was a hang because unlock on an entry failed with
ENOTCONN.
Client thinks the connection is down where as server thinks
the connection is up.
This is the race we are seeing:
1) Connection from client to the brick disconnects.
2) Saved frames unwind is called which unwinds all
frames that were wound before disconnect.
3) connection from client to the brick happens and
setvolume.
4) Disconnect notification for the connection in 1)
comes now and calls client_rpc_notify() which
marks the connection to be offline even when the
connection is up.
This is happening because I/O can retrigger connection
before disconnect notification is sent to the higher
layers in rpc.
Fix:
Notify the higher layers that a disconnect happened and then
go ahead with reconnect logic.
For the logs which point to the information above check:
https://bugzilla.redhat.com/show_bug.cgi?id=1386626#c1
Thanks to Raghavendra G for suggesting the correct fix.
BUG: 1386626
Change-Id: I3c84ba1f17010bd69049fa88ec5f0ae431f8cda9
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/15681
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Address of a local variable @args is copied into state->req
in server3_3_compound (). But even after the function has gone out of
scope, in server_compound_resume () this pointer is accessed and
dereferenced. This patch fixes that.
2. Compound fops, by virtue of NOT having a vector sizer (like the one
writev has), ends up having both the header and the data (in case one of
its member fops is WRITEV) in the same hdr_iobuf. This buffer was not
being preserved through the lifetime of the compound fop, causing it to
be overwritten by a parallel write fop, even when the writev associated
with the currently executing compound fop is yet to hit the desk, thereby
corrupting the file's data. This is fixed by associating the hdr_iobuf with
the iobref so its memory remains valid through the lifetime of the fop.
3. Also fixed a use-after-free bug in protocol/client in compound fops cbk,
missed by Linux but caught by NetBSD.
Finally, big thanks to Pranith Kumar K and Raghavendra Gowdappa for their
help in debugging this file corruption issue.
Change-Id: I6d5c04f400ecb687c9403a17a12683a96c2bf122
BUG: 1378778
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/15654
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The RPC/XID for callbacks has been hardcoded to GF_UNIVERSAL_ANSWER. In
Wireshark these RPC-calls are marked as "RPC retransmissions" because of
the repeating RPC/XID. This is most confusing when verifying the
callbacks that the upcall framework sends. There is no way to see the
difference between real retransmissions and new callbacks.
This change was verified by create and removal of files through
different Gluster clients. The RPC/XID is increased on a per connection
(or client) base. The expectations of the RPC protocol are met this way.
Change-Id: I2116bec0e294df4046d168d8bcbba011284cd0b2
BUG: 1377097
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/15524
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And minor cleanup of a few of the Makefile.am files while we're
at it.
Rewrite the make rules to do what xdrgen does. Now we can get rid
of xdrgen.
Note 1. netbsd6's sed doesn't do -i. Why are we still running
smoke tests on netbsd6 and not netbsd7? We barely support netbsd7
as it is.
Note 2. Why is/was libgfxdr.so (.../rpc/xdr/src/...) linked with
libglusterfs? A cut-and-paste mistake? It has no references to
symbols in libglusterfs.
Note3. "/#ifndef\|#define\|#endif/" (note the '\'s) is a _basic_
regex that matches the same lines as the _extended_ regex
"/#(ifndef|define|endif)/". To match the extended regex sed needs to
be run with -r on Linux; with -E on *BSD. However NetBSD's and
FreeBSD's sed helpfully also provide -r for compatibility. Using a
basic regex avoids having to use a kludge in order to run sed with
the correct option on OS X.
Note 4. Not copying the bit of xdrgen that inserts copyright/license
boilerplate. AFAIK it's silly to pretend that machine generated
files like these can be copyrighted or need license boilerplate.
The XDR source files have their own copyright and license; and
their copyrights are bound to be more up to date than old
boilerplate inserted by a script. From what I've seen of other
Open Source projects -- e.g. gcc and its C parser files generated
by yacc and lex -- IIRC they don't bother to add copyright/license
boilerplate to their generated files.
It appears that it's a long-standing feature of make (SysV, BSD,
gnu) for out-of-tree builds to helpfully pretend that the source
files it can find in the VPATH "exist" as if they are in the $cwd.
rpcgen doesn't work well in this situation and generates files
with "bad" #include directives.
E.g. if you `rpcgen ../../../../$srcdir/rpc/xdr/src/glusterfs3-xdr.x`,
you get an #include directive in the generated .c file like this:
...
#include "../../../../$srcdir/rpc/xdr/src/glusterfs3-xdr.h"
...
which (obviously) results in compile errors on out-of-tree build
because the (generated) header file doesn't exist at that location.
Compared to `rpcgen ./glusterfs3-xdr.x` where you get:
...
#include "glusterfs3-xdr.h"
...
Which is what we need. We have to resort to some Stupid Make Tricks
like the addition of various .PHONY targets to work around the VPATH
"help".
Warning: When doing an in-tree build, -I$(top_builddir)/rpc/xdr/...
looks exactly like -I$(top_srcdir)/rpc/xdr/... Don't be fooled though.
And don't delete the -I$(top_builddir)/rpc/xdr/... bits
Change-Id: Iba6ab96b2d0a17c5a7e9f92233993b318858b62e
BUG: 1330604
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/14085
Tested-by: Niels de Vos <ndevos@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The command basically allows replace brick with src and
dst bricks as same.
Usage:
gluster v reset-brick <volname> <hostname:brick-path> start
This command kills the brick to be reset. Once this command is run,
admin can do other manual operations that they need to do,
like configuring some options for the brick. Once this is done,
resetting the brick can be continued with the following options.
gluster v reset-brick <vname> <hostname:brick> <hostname:brick> commit {force}
Does the job of resetting the brick. 'force' option should be used
when the brick already contains volinfo id.
Problem: On doing a disk-replacement of a brick in a replicate volume
the following 2 scenarios may occur :
a) there is a chance that reads are served from this replaced-disk brick,
which leads to empty reads. b) potential data loss if next writes succeed
only on replaced brick, and heal is done to other bricks from this one.
Solution: After disk-replacement, make sure that reset-brick command is
run for that brick so that pending markers are set for the brick and it
is not chosen as source for reads and heal. But, as of now replace-brick
for the same brick-path is not allowed. In order to fix the above
mentioned problem, same brick-path replace-brick is needed.
With this patch reset-brick commit {force} will be allowed even when
source and destination <hostname:brickpath> are identical as long as
1) destination brick is not alive
2) source and destination brick have the same brick uuid and path.
Also, the destination brick after replace-brick will use the same port
as the source brick.
Change-Id: I440b9e892ffb781ea4b8563688c3f85c7a7c89de
BUG: 1266876
Signed-off-by: Anuradha Talur <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/12250
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
http://review.gluster.org/14085 fixes a/the "leak" - via the
generated rpc/xdr headers - of pragmas that mask these warnings.
However 14085 won't pass the smoke test until all the warnings are
fixed.
Change-Id: I20d91091bee0bf8f198a307ebba4b284bc3817ff
BUG: 1369124
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/15240
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently there is no existing CLI that can be used to get the
local state representation of the cluster as maintained in glusterd
in a readable as well as parseable format.
The CLI added has the following usage:
# gluster get-state [daemon] [odir <path/to/output/dir>] [file <filename>]
This would dump data points that reflect the local state
representation of the cluster as maintained in glusterd (no other
daemons are supported as of now) to a file inside the specified
output directory. The default output directory and filename is
/var/run/gluster and glusterd_state_<timestamp> respectively. The
option for specifying the daemon name leaves room to add support for
other daemons in the future. Following are the data points captured
as of now to represent the state from the local glusterd pov:
* Peer:
- Primary hostname
- uuid
- state
- connection status
- List of hostnames
* Volumes:
- name, id, transport type, status
- counts: bricks, snap, subvol, stripe, arbiter, disperse,
redundancy
- snapd status
- quorum status
- tiering related information
- rebalance status
- replace bricks status
- snapshots
* Bricks:
- Path, hostname (for all bricks these info will be shown)
- port, rdma port, status, mount options, filesystem type and
signed in status for bricks running locally.
* Services:
- name, online status for initialised services
* Others:
- Base port, last allocated port
- op-version
- MYUUID
Change-Id: I4a45cc5407ab92d8afdbbd2098ece851f7e3d618
BUG: 1353156
Signed-off-by: Samikshan Bairagya <samikshan@gmail.com>
Reviewed-on: http://review.gluster.org/14873
Reviewed-by: Avra Sengupta <asengupt@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CID:1357876
Change-Id: I34eee0bf0367f963ddf6e9d6b28b7d3a9af12c90
BUG: 789278
Signed-off-by: Muthu-vigneshwaran <mvignesh@redhat.com>
Reviewed-on: http://review.gluster.org/15094
Tested-by: Muthu Vigneshwaran
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PROBLEM:
1. Freeing up rpc_clnt object might lead to crashes. Well,
it was not a necessity to free rpc-clnt object till now
because all the existing use cases needs to reconnect
back on disconnects. Hence timer code was not taking
ref on rpc-clnt object.
Glusterd had some use-cases that led to crash due to
ping-timer and they fixed only those code paths that
involve ping-timer.
Now, since changelog has an use-case where rpc-clnt
need to be freed up, we need to fix timer code to take
refs
2. In changelog, because of issue 1, only mydata was being
freed which is incorrect. And there are races where
rpc-clnt object would access the freed mydata which
would lead to crashes.
Since changelog xlator resides on brick side and is long
living process, if multiple libgfchangelog consumers
register to changelog and disconnect/reconnect mulitple
times, it would result in leak of 'rpc-clnt' object
for every connect/disconnect.
SOLUTION:
1. Handle ref/unref of 'rpc_clnt' structure in timer
functions properly.
2. In changelog, unref 'rpc_clnt' in RPC_CLNT_DISCONNECT
after disabling timers and free mydata on RPC_CLNT_DESTROY.
RPC SETUP IN CHANGELOG:
1. changelog xlator initiates rpc server say 'changelog_rpc_server'
2. libgfchangelog initiates one rpc server say 'libgfchangelog_rpc_server'
3. libgfchangelog initiates rpc client and connects to 'changelog_rpc_server'
4. In return changelog_rpc_server initiates a rpc client and connects back
to 'libgfchangelog_rpc_server'
REF/UNREF HANDLING IN TIMER FUNCTIONS:
Let's say rpc clnt refcount = 1
1. Take the ref before reigstering callback to timer queue
>>>> rpc_clnt_ref (say ref count becomes = 2)
2. Register a callback to timer say 'callback1'
3. If register fails:
>>>> rpc_clnt_unref (ref count = 1)
4. On timer expiration, 'callback1' gets called. So unref rpc clnt at the end
in 'callback1'. This is corresponding to ref taken in step 1
>>>> rpc_clnt_unref (ref count = 1)
5. The cycle from step-1 to step-4 continues....until timer cancel event happens
6. timer cancel of say 'callback1'
If timer cancel fails:
Do nothing, Step-4 would have unrefd
If timer cancel succeeds:
>>>> rpc_clnt_unref (ref count = 1)
Change-Id: I91389bc511b8b1a17824941970ee8d2c29a74a09
BUG: 1316178
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/13658
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Once dynstr is set into a dict by function dict_set_dynstr, its free
operation will be called by this dict when the dict is destroyed.
Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com>
Change-Id: Idd2bd19a041bcb477e1c897428ca1740fb75c5f3
BUG: 1354141
Reviewed-on: http://review.gluster.org/14882
Tested-by: Zhou Zhengping <johnzzpcrystal@gmail.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use signin, but not signup. Having both just introduces confusion.
The proc number has been retained to avoid changes to the numbering of
other procs, and the mapping to a name has similarly been retained as a
placeholder, but the code and structure definitions have been removed.
Change-Id: I60f64f3b5d71ba6ed6862b36a38f90a9c8271c9f
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/14792
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cleaning dead initializations.
Change-Id: I53ae506593cd10441d61df3b2c9249544a7871f7
BUG: 1254067
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-on: http://review.gluster.org/11932
Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
|