glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	rpc: Synchronize slot allocation code	Mohit Agrawal	2019-12-26	1	-34/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Current slot allocation/deallocation code path is not synchronized.There are scenario when due to race condition in slot allocation/deallocation code path brick is crashed. Solution: Synchronize slot allocation/deallocation code path to avoid the issue > Change-Id: I4fb659a75234218ffa0e5e0bf9308f669f75fc25 > Fixes: bz#1763036 > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> > (cherry picked from commit faf5ac13c4ee00a05e9451bf8da3be2a9043bbf2) Change-Id: I4fb659a75234218ffa0e5e0bf9308f669f75fc25 Fixes: bz#1778182 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	core/syncop: Bail out if frame creation fails	Soumya Koduri	2019-09-23	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There could be cases (either due to insufficient memory or corrupted mem-pool) due to which frame creation fails. Bail out with error in such cases. This is the backport of below mainline fix - > Fixes: bz#1748448 > review url: https://review.gluster.org/#/c/glusterfs/+/23350/ Change-Id: I8cc0a5852f6f04d2bac991e4eb79ecb42577da11 Fixes: bz#1751557 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	event: rename event_XXX with gf_ prefixed	Xiubo Li	2019-08-28	3	-40/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I hit one crash issue when using the libgfapi. In the libgfapi it will call glfs_poller() --> event_dispatch() in file api/src/glfs.c:721, and the event_dispatch() is defined by libgluster locally, the problem is the name of event_dispatch() is the extremly the same with the one from libevent package form the OS. For example, if a executable program Foo, which will also use and link the libevent and the libgfapi at the same time, I can hit the crash, like: kernel: glfs_glfspoll[68486]: segfault at 1c0 ip 00007fef006fd2b8 sp 00007feeeaffce30 error 4 in libevent-2.0.so.5.1.9[7fef006ed000+46000] The link for Foo is: lib_foo_LADD = -levent $(GFAPI_LIBS) It will crash. This is because the glfs_poller() is calling the event_dispatch() from the libevent, not the libglsuter. The gfapi link info : GFAPI_LIBS = -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid If I link Foo like: lib_foo_LADD = $(GFAPI_LIBS) -levent It will works well without any problem. And if Foo call one private lib, such as handler_glfs.so, and the handler_glfs.so will link the GFAPI_LIBS directly, while the Foo won't and it will dlopen(handler_glfs.so), then the crash will be hit everytime. The link info will be: foo_LADD = -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) I can avoid the crash temporarily by linking the GFAPI_LIBS in Foo too like: foo_LADD = $(GFAPI_LIBS) -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) But this is ugly since the Foo won't use any APIs from the GFAPI_LIBS. And in some cases when the --as-needed link option is added(on many dists it is added as default), then the crash is back again, the above workaround won't work. Backport of: > https://review.gluster.org/#/c/glusterfs/+/23110/ > Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa > Fixes: #699 > Signed-off-by: Xiubo Li <xiubli@redhat.com> Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa updates: bz#1740525 Signed-off-by: Xiubo Li <xiubli@redhat.com> (cherry picked from commit 799edc73c3d4f694c365c6a7c27c9ab8eed5f260)
*	ctime: Fix incorrect realtime passed to frame->root->ctime	Kotresh HR	2019-08-28	3	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On systems that don't support "timespec_get"(e.g., centos6), it was using "clock_gettime" with "CLOCK_MONOTONIC" to get unix epoch time which is incorrect. This patch introduces "timespec_now_realtime" which uses "clock_gettime" with "CLOCK_REALTIME" which fixes the issue. Backport of: > Patch: https://review.gluster.org/23274/ > Change-Id: I57be35ce442d7e05319e82112b687eb4f28d7612 > Signed-off-by: Kotresh HR <khiremat@redhat.com> > BUG: 1743652 (cherry picked from commit d14d0749340d9cb1ef6fc4b35f2fb3015ed0339d) Change-Id: I57be35ce442d7e05319e82112b687eb4f28d7612 Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1726175
*	ctime: Set mdata xattr on legacy files	Kotresh HR	2019-08-06	5	-0/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The files which were created before ctime enabled would not have "trusted.glusterfs.mdata"(stores time attributes) xattr. Upon fops which modifies either ctime or mtime, the xattr gets created with latest ctime, mtime and atime, which is incorrect. It should update only the corresponding time attribute and rest from backend Solution: Creating xattr with values from brick is not possible as each brick of replica set would have different times. So create the xattr upon successful lookup if the xattr is not created Note To Reviewers: The time attributes used to set xattr is got from successful lookup. Instead of sending the whole iatt over the wire via setxattr, a structure called mdata_iatt is sent. The mdata_iatt contains only time attributes. Backport of: > Patch: https://review.gluster.org/22936 > Change-Id: I5e535631ddef04195361ae0364336410a2895dd4 > BUG: 1593542 > Signed-off-by: Kotresh HR <khiremat@redhat.com> Change-Id: I5e535631ddef04195361ae0364336410a2895dd4 updates: bz#1733885 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	core: fix deadlock between statedump and fd_anonymous()	Xavi Hernandez	2019-07-16	1	-76/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There exists a deadlock between statedump generation and fd_anonymous() function because they are acquiring inode table lock and inode lock in reverse order. This patch modifies fd_anonymous() so that it takes inode lock only when it's really necessary, avoiding the deadlock. Backport of: > Change-Id: I24355447f0ea1b39e2546782ad07f0512cc381e7 > BUG: 1727068 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Change-Id: I24355447f0ea1b39e2546782ad07f0512cc381e7 Fixes: bz#1729952 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	features/shard: Fix block-count accounting upon truncate to lower size	Krutika Dhananjay	2019-07-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of: > BUG: bz#1705884 > Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53 > Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> The way delta_blocks is computed in shard is incorrect, when a file is truncated to a lower size. The accounting only considers change in size of the last of the truncated shards. FIX: Get the block-count of each shard just before an unlink at posix in xdata. Their summation plus the change in size of last shard (from an actual truncate) is used to compute delta_blocks which is used in the xattrop for size update. Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53 fixes: bz#1716871 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit 400b66d568ad18fefcb59949d1f8368d487b9a80)
*	core: fix memory allocation issues	Xavi Hernandez	2019-07-03	2	-27/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two problems have been identified that caused that gluster's memory usage were twice higher than required. 1. An off by 1 error caused that all objects allocated from the memory pools were taken from a pool bigger than required. Since each pool corresponds to a size equal to a power of two, this was wasting half of the available memory. 2. The header information used for accounting on each memory object was not taken into consideration when searching for a suitable memory pool. It was added later when each individual block was allocated. This made this space "invisible" to memory accounting. Credits: Thanks to Nithya Balachandran for identifying this problem and testing this patch. Backport of: > BUG: 1722802 > Change-Id: I90e27ad795fe51ca11c13080f62207451f6c138c > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Fixes: bz#1724210 Change-Id: I90e27ad795fe51ca11c13080f62207451f6c138c Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	libglusterfs: Fix compilation when --disable-mempool is used	Pranith Kumar K	2019-07-03	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \|	Backport of: > BUG: 1193929 > Change-Id: I245c065b209bcce5db939b6a0a934ba6fd393b47 > Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Updates: bz#1724210 Change-Id: I245c065b209bcce5db939b6a0a934ba6fd393b47 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	mem-pool.{c\|h}: minor changes	Yaniv Kaul	2019-07-03	1	-25/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Removed some code that was not needed. It did not really do anything. 2. CALLOC -> MALLOC in one place. Compile-tested only! Backport of: > BUG: 1193929 > Signed-off-by: Yaniv Kaul <ykaul@redhat.com> > Change-Id: I4419161e1bb636158e32b5d33044b06f1eef2449 Updates: bz#1724210 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I4419161e1bb636158e32b5d33044b06f1eef2449
*	core: avoid dynamic TLS allocation when possible	Xavi Hernandez	2019-07-03	6	-431/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some interdependencies between logging and memory management functions make it impossible to use the logging framework before initializing memory subsystem because they both depend on Thread Local Storage allocated through pthread_key_create() during initialization. This causes a crash when we try to log something very early in the initialization phase. To prevent this, several dynamically allocated TLS structures have been replaced by static TLS reserved at compile time using '__thread' keyword. This also reduces the number of error sources, making initialization simpler. Backport of: > BUG: 1193929 > Change-Id: I8ea2e072411e30790d50084b6b7e909c7bb01d50 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Updates: bz#1724210 Change-Id: I8ea2e072411e30790d50084b6b7e909c7bb01d50 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	core: fix hang issue in __gf_free	Susant Palai	2019-06-28	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently GF_ASSERT is done under mem_accounting lock at some places. On a GF_ASSERT failure, gf_msg_callingfn is called which calls gf_malloc internally and it takes the same mem_accounting lock leading to deadlock. This is a temporary fix to avoid any hang issue in master. https://review.gluster.org/#/c/glusterfs/+/22589/ is being worked on in the mean while so that GF_ASSERT can be used under mem_accounting lock. Backport of: > Change-Id: I6d67f23979e7edd2695bdc6aab2997dae4a4060a > BUG: 1700865 > Signed-off-by: Susant Palai <spalai@redhat.com> Change-Id: I6d67f23979e7edd2695bdc6aab2997dae4a4060a Updates: bz#1724210 Signed-off-by: Susant Palai <spalai@redhat.com>
*	mem-pool: remove dead code.	Yaniv Kaul	2019-06-28	2	-81/+0
\| \| \| \| \| \| \| \| \| \| \|	Backport of: > Change-Id: I3bbda719027b45e1289db2e6a718627141bcbdc8 > BUG: 1193929 > Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I3bbda719027b45e1289db2e6a718627141bcbdc8 updates: bz#1724210 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	core: handle memory accounting correctly	Xavi Hernandez	2019-04-23	4	-114/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a translator stops, memory accounting for that translator is not destroyed (because there could remain memory allocated that references it), but mutexes that coordinate updates of memory accounting were destroyed. This caused incorrect memory accounting and even crashes in debug mode. This patch also fixes some other things: * Reduce the number of atomic operations needed to manage memory accounting. * Correctly account memory when realloc() is used. * Merge two critical sections into one. * Cleaned the code a bit. Backport of: > Change-Id: Id5eaee7338729b9bc52c931815ca3ff1e5a7dcc8 > BUG: bz#1659334 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Change-Id: Id5eaee7338729b9bc52c931815ca3ff1e5a7dcc8 Fixes: bz#1702271 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	cluster-syncop: avoid duplicate unlock of inodelk/entrylk	Kinglong Mee	2019-04-17	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When using ec, there are many messages at brick log as, [inodelk.c:514:__inode_unlock_lock] 0-test-locks: Matching lock not found for unlock 0-9223372036854775807, lo=68e040a84b7f0000 on 0x7f208c006f78 [MSGID: 115053] [server-rpc-fops_v2.c:280:server4_inodelk_cbk] 0-test-server: 2557439: INODELK <gfid:df4e41be-723f-4289-b7af-b4272b3e880c> (df4e41be-723f-4289-b7af-b4272b3e880c), client: CTX_ID:67d4a7f3-605a-4965-89a5-31309d62d1fa-GRAPH_ID:0-PID:1659-HOST:openfs-node2-PC_NAME:test-client-1-RECON_NO:-28, error-xlator: test-locks [Invalid argument] > Change-Id: Ib164d29ebb071f620a4ca9679c4345ef7c88512a > Signed-off-by: Kinglong Mee <mijinlong@open-fs.com> (cherry-pick of https://review.gluster.org/#/c/glusterfs/+/22377/) Change-Id: I6e0eaba6aca6cd99ba2a5ae2e580167d54d8ea26 Updates: bz#1690950 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
*	core: Log level changes do not effect on running client process	Mohit Agrawal	2019-04-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: commit c34e4161f3cb6539ec83a9020f3d27eb4759a975 set log-level per xlator during reconfigure only for a brick process not for the client process. Solution: 1) Change per xlator log-level only if brick_mux is enabled.To make sure about brick multiplex introudce a flag brick_mux at ctx->cmd_args. Note: There are two other changes done with this patch 1) Ignore client-log-level option to attach a brick with already running brick if brick_mux is enabled 2) Add a log to print pid of the running process to make easier debugging > Change-Id: I39e85de778e150d0685cd9a79425ce8b4783f9c9 > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> > Fixes: bz#1696046 > (Cherry picked from commit 798aadbe51a9a02dd98a0f861cc239ecf7c8ed57) > (Reviewed on upstream link https://review.gluster.org/#/c/glusterfs/+/22495/) Change-Id: If91682830f894ab8f6857f19dcb1797fc15ca64c Fixes: bz#1699715 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	logging: Fix GF_LOG_OCCASSIONALLY API	Atin Mukherjee	2019-04-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	GF_LOG_OCCASSIONALLY doesn't log on the first instance rather at every 42nd iterations which isn't effective as in some cases we might not have the code flow hitting the same log for as many as 42 times and we'd end up suppressing the log. Updates: bz#1679904 Change-Id: Iee293281d25a652b64df111d59b13de4efce06fa Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit d0d3e10d44366c68fc153e48b229e72a4aa26e61)
*	glusterfsd: Multiple shd processes are spawned on brick_mux environment	Mohit Agrawal	2019-03-12	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Multiple shd processes are spawned while starting volumes in the loop on brick_mux environment.glusterd spawn a process based on a pidfile and shd daemon is taking some time to update pid in pidfile due to that glusterd is not able to get shd pid Solution: Commit cd249f4cb783f8d79e79468c455732669e835a4f changed the code to update pidfile in parent for any gluster daemon after getting the status of forking child in parent.To resolve the same correct the condition update pidfile in parent only for glusterd and for rest of the daemon pidfile is updated in child > Change-Id: Ifd14797fa949562594a285ec82d58384ad717e81 > fixes: bz#1684404 > (Cherry pick from commit 66986594a9023c49e61b32769b7e6b260b600626) > (Reviewed on upstream link https://review.gluster.org/#/c/glusterfs/+/22290/) Change-Id: I9a68064d2da1acd0ec54b4071a9995ece0c3320c fixes: bz#1683880 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	core: make compute_cksum function op_version compatible	Sanju Rakonde	2019-03-08	3	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: commit 5a152a changed the mechansim of computing the checksum. In heterogeneous cluster, peers are running into rejected state because we have different cksum computation mechansims in upgraded and non-upgraded nodes. Solution: add a check for op-version so that all the nodes in the cluster follow the same mechanism for computing the cksum. fixes: bz#1684029 > Change-Id: I1508f000e8c9895588b6011b8b6cc0eda7102193 > BUG: bz#1685120 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> > (cherry picked from commit 073444b693b7a91c42963512e0fdafb57ad46670) Change-Id: I1508f000e8c9895588b6011b8b6cc0eda7102193
*	socket: socket event handlers now return void	Milind Changire	2019-03-02	4	-18/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Returning any value from socket event handlers to the event sub-system doesn't make sense since event sub-system cannot handle socket sub-system errors. Solution: Change return type of all socket event handlers to 'void' mainline: > Change-Id: I70dc2c57f12b7ea2fae41120f71aa0d7fe0b2b6f > Fixes: bz#1651246 > Signed-off-by: Milind Changire <mchangir@redhat.com> > Reviewed-on: https://review.gluster.org/c/glusterfs/+/22221 Change-Id: I70dc2c57f12b7ea2fae41120f71aa0d7fe0b2b6f Fixes: bz#1683900 Signed-off-by: Milind Changire <mchangir@redhat.com> (cherry picked from commit 776ba851c6ee6c265253d44cf1d6e4e3d4a21772)
*	glusterd: manage upgrade to current master	Amar Tumballi	2019-02-04	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Scenarios tested: * Upgrade the node when there are stripe / tiering and regular type of volumes are present. - All volumes are started fine (as the change was not on brick volfile) - For tier, the functionality may not even work, as changetimerecorder is not present. - 'gluster volume info' properly shows as 'NOT SUPPORTED' for stripe and tier type of volume. * Upgrade in a rolling upgrade scenario, where an old version is able to connect to higher master. - on a normal volume, if the volfile-server was new, the newer client volfiles needed to have utime xlator conditionally. - with this one change, all other changes seem to work fine. Change-Id: Ib2d3b69dafa02b2c695a735b13c1aa70aba07cb8 updates: bz#1635688 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	mount/fuse: expose auto-invalidation as a mount option	Raghavendra Gowdappa	2019-02-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Auto invalidation is necessary when same (meta)data is shared/access across multiple mounts. However, if (meta)data is not shared, all relevant I/O goes through the cache of single mount and hence is coherent with (meta)data on bricks always. So, fuse-auto-invalidation can be disabled for this case which gives a huge performance boost for workloads that write data and then immediately read the data they just wrote. From glusterfs --help, <snip> --auto-invalidation[=BOOL] controls whether fuse-kernel can auto-invalidate attribute, dentry and page-cache. Disable this only if same files/directories are not accessed across two different mounts concurrently [default: "on"] </snip> Details on how disabling auto-invalidation helped to reduce pgbench init times can be found at [1]. Time taken for pgbench init of scale 8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s) with auto-invalidations turned off along with other optimizations. Just disabling auto-invalidation contributed 56% improvement by reducing the total time taken by 33260s. [1] https://www.spinics.net/lists/gluster-devel/msg25907.html Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> updates: bz#1664934
*	core: make gf_thread_create() easier to use	Xavi Hernandez	2019-02-01	5	-59/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch creates a specific function to set the thread name using a string format and a variable argument list, like printf(). This function is used to set the thread name from gf_thread_create(), which now accepts a variable argument list to create the full name. It's not necessary anymore to use a local array to build the name of the thread. This is done automatically. Change-Id: Idd8d01fd462c227359b96e98699f8c6d962dc17c Updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	core: move "dict is NULL" logs to DEBUG log level	Milind Changire	2019-02-01	1	-2/+2
\| \| \| \| \| \| \| \| \|	Too many logs get printed if dict_ref() and dict_unref() are passed NULL pointer. fixes: bz#1671213 Change-Id: I18afd849d64318f68baa7b549ee310dac0e1e786 Signed-off-by: Milind Changire <mchangir@redhat.com>
*	syncop: remove unnecessary call to gf_backtrace_save()	Xavi Hernandez	2019-01-31	2	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \|	A call to gf_backtrace_save() was done on each context switch of a synctask. The backtrace is generated writing to the filesystem, so it can have an important impact on latency. The generated backtrace was not used anywhere, so it's been removed. Change-Id: I399a93b932c5b6e981c696c72c3e1ef44710ba52 Updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	core: heketi-cli is throwing error "target is busy"	Mohit Agrawal	2019-01-31	4	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When rpc-transport-disconnect happens, server_connection_cleanup_flush_cbk() is supposed to call rpc_transport_unref() after open-files on that transport are flushed per transport.But open-fd-count is maintained in bound_xl->fd_count, which can be incremented/decremented cumulatively in server_connection_cleanup() by all transport disconnect paths. So instead of rpc_transport_unref() happening per transport, it ends up doing it only once after all the files on all the transports for the brick are flushed leading to rpc-leaks. Solution: To avoid races maintain fd_cnt at client instead of maintaining on brick Credits: Pranith Kumar Karampuri Change-Id: I6e8ea37a61f82d9aefb227c5b3ab57a7a36850e6 fixes: bz#1668190 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	Multiple files: reduce work while under lock.	Yaniv Kaul	2019-01-29	4	-11/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mostly, unlock before logging. In some cases, moved different code that was not needed to be under lock (for example, taking time, or malloc'ing) to be executed before taking the lock. Note: logging might be slightly less accurate in order, since it may not be done now under the lock, so order of logs is racy. I think it's a reasonable compromise. Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I2438710016afc9f4f62a176ef1a0d3ed793b4f89
*	socket: fix issue on concurrent handle of a socket	Zhang Huan	2019-01-28	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Found an issue on concurrent invoke of event handler to the same socket fd, causing memory corruption. This issue arises after applying commit "socket: Remove redundant in_lock in incoming message handling" that removes priv->in_lock to serialize socket read. The following call sequence describes how concurrent socket event handle happens. thread 1 thread 2 thread 3 epoll_wait() return (slot->in_handler is 0) call select_on_epoll() and epoll_ctl() on fd epoll_wait() return slot->in_handler++ (slot->in_handler is 1) slot->in_handler++ (slot->in_handler is 2) call handler() call handler() Fix this issue by skip invoke of handler if there is already a handler inprogress. Change-Id: I437126ac772debcadb00993a948919c931cd607b updates: bz#1467614 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
*	afr/self-heal:Fix wrong type checking	Ravishankar N	2019-01-24	4	-1/+28
\| \| \| \| \| \| \| \| \| \|	gf_dirent struct has d_type variable which should check with DT_DIR istead of IA_IFDIR or IA_IFDIR has to compare with entry->d_stat.ia_type Change-Id: Idf1059ce2a590734bc5b6adaad73604d9a708804 updates: bz#1653359 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	core: move logs which are only developer relevant to DEBUG level	Amar Tumballi	2019-01-23	3	-5/+5
\| \| \| \| \| \| \| \| \| \| \|	We had only changed the log level to DEBUG in release branch earlier. But considering 90%+ of our deployments happen in same env, we can look at these specific logs on need basis. With this change, the master branch will be easier to debug with lesser logs. Change-Id: I4157a7ec7d5ec9c2948b2bbc1e4cb8317f28d6b8 Updates: bz#1666833 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	rpc: use address-family option from vol file	Milind Changire	2019-01-22	3	-2/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch helps enable IPv6 connections in the cluster. The default address-family is IPv4 without using this option explicitly. When address-family is set to "inet6" in the /etc/glusterfs/glusterd.vol file, the mount command-line also needs to have -o xlator-option="transport.address-family=inet6" added to it. This option also gets added to the brick command-line. Snapshot and gfapi use-cases should also use this option to pass in the inet6 address-family. Change-Id: I97db91021af27bacb6d7578e33ea4817f66d7270 fixes: bz#1635863 Signed-off-by: Milind Changire <mchangir@redhat.com>
*	common-utils: make vector a const parameter	Xiubo Li	2019-01-22	1	-1/+1
\| \| \| \| \| \| \| \|	To avoid the warning and preparing for adding writesame support. Updates: #617 Change-Id: I0710b1e4c240368a9bf52968bddc6e250ae2028d Signed-off-by: Xiubo Li <xiubli@redhat.com>
*	core: Feature added to accept CidrIp in auth.allow	Rinku Kothiya	2019-01-18	3	-6/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added functionality to gluster volume set auth.allow command to accept CIDR IP addresses. Modified few functions to isolate cidr feature so that it prevents other gluster commands such as peer probe to use cidr format ip. The functions are modified in such a way that they have an option to enable accepting of cidr format for other gluster commands if required in furture. updates: bz#1138841 Change-Id: Ie6734002a7078f1820e5df42d404411cce945e8b Credits: Mohit Agrawal Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
*	lock: Add fencing support	Susant Palai	2019-01-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	design reference: https://review.gluster.org/#/c/glusterfs-specs/+/21925/ This patch adds the lock preempt support. Note: The current model stores lock enforcement information as separate xattr on disk. There is another effort going in parallel to store this in stat(x) of the file. This patch is self sufficient to add fencing support. Based on the availability of the stat(x) support either I will rebase this patch or we can modify the necessary bits post merging this patch. Change-Id: If4a42f3e0afaee1f66cdb0360ad4e0c005b5b017 updates: #466 Signed-off-by: Susant Palai <spalai@redhat.com>
*	core: Resolve dict_leak at the time of destroying graph	Mohit Agrawal	2019-01-14	6	-12/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Problem: In gluster code some of the places it call's get_new_dict to create a dictionary without taking reference so at the time of dict_unref it has become a leak Solution: To resolve the same call dict_new instead of get_new_dict updates bz#1650403 Change-Id: I3ccbbf5af07079a4fa09aad2cd0458c8625b2f06 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	fix 32-bit-build-smoke warnings	Iraj Jamali	2019-01-11	3	-8/+8
\| \| \| \| \| \| \|	fixes: bz#1622665 Change-Id: I777d67b1b62c284c62a02277238ad7538eef001e Signed-off-by: Iraj Jamali <ijamali@redhat.com>
*	libglusterfs/common-utils.c: Fix buffer size for checksum computation	Varsha Rao	2019-01-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When quorum count option is updated, the change is not reflected in the nfs-server.vol file. This is because in get_checksum_for_file(), when the last part of the file read has size less than buffer size, the read buffer stores old data value along with correct data value. Solution: Pass the bytes read instead of fixed buffer size, for calculating checksum. Change-Id: I4b641607c8a262961b3f3da0028a54e08c3f8589 fixes: bz#1657744 Signed-off-by: Varsha Rao <varao@redhat.com>
*	Revert "iobuf: Get rid of pre allocated iobuf_pool and use per thread mem pool"	Amar Tumballi	2019-01-08	3	-72/+877
\| \| \| \| \| \| \| \| \|	This reverts commit b87c397091bac6a4a6dec4e45a7671fad4a11770. There seems to be some performance regression with the patch and hence recommended to have it reverted. Updates: #325 Change-Id: Id85d6203173a44fad6cf51d39b3e96f37afcec09
*	gfapi: new api glfs_statx as linux's statx	ShyamsundarR	2019-01-07	1	-4/+13
\| \| \| \| \| \| \|	Change-Id: I44dd6ceef0954ae7fc13f920e84d81bbd3f6a774 Updates: #389 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com> Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	multiple-files: clang-scan fixes	Amar Tumballi	2018-12-31	1	-3/+3
\| \| \| \| \| \|	updates: bz#1622665 Change-Id: I9f3a75ed9be3d90f37843a140563c356830ef945 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	libglusterfs/src/mem-types.h: remove unused common enums from mem-types.h	Yaniv Kaul	2018-12-30	1	-136/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	They were not used at all, just taking space. I've also marked all those that are not common really, but used in just one place - they probably should move there (in follow-up patches) As a test, I've removed from the stripe xlator unused private enums and moved one that was in the common list, but only used in the stripe code, to be a private enum. Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I1158dc1d259f1fd3f69904336c46c9d83cea799f
*	all: handle USE_AFTER_FREE warnings	Amar Tumballi	2018-12-20	2	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* we shouldn't be using 'local' after DHT_STACK_UNWIND() as it frees the content of local. Add a 'goto out' or similar logic to handle the situation. * fix possible overlook of unref(dict), instead of unref(xdata). * make coverity happy by re-ordering unref in meta-defaults. * gfid-access: re-order dictionary allocation so we don't have to do a extra unref. * other obvious errors reported. updates: bz#789278 Change-Id: If05961ee946b0c4868df19861d7e4a927a2a2489 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	posix: use synctask for janitor	Poornima G	2018-12-19	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	With brick mux, the number of threads increases as the number of bricks increases. As an initiative to reduce the number of threads in brick mux scenario, replacing janitor thread to use synctask infra. Now close() and closedir() handle by separate janitor thread which is linked with glusterfs_ctx. Updates #475 Change-Id: I0c4aaf728125ab7264442fde59f3d08542785f73 Signed-off-by: Poornima G <pgurusid@redhat.com>
*	cluster/afr: Allow lookup on root if it is from ADD_REPLICA_MOUNT	karthik-us	2018-12-18	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When trying to convert a plain distribute volume to replica-3 or arbiter type it is failing with ENOTCONN error as the lookup on the root will fail as there is no quorum. Fix: Allow lookup on root if it is coming from the ADD_REPLICA_MOUNT which is used while adding bricks to a volume. It will try to set the pending xattrs for the newly added bricks to allow the heal to happen in the right direction and avoid data loss scenarios. Note: This fix will solve the problem of type conversion only in the case where the volume was mounted at least once. The conversion of non mounted volumes will still fail since the dht selfheal tries to set the directory layout will fail as they do that with the PID GF_CLIENT_PID_NO_ROOT_SQUASH set in the frame->root. Change-Id: Ic511939981dad118cc946754341318b164954b3b fixes: bz#1655854 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	iobuf: Get rid of pre allocated iobuf_pool and use per thread mem pool	Poornima G	2018-12-18	3	-877/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current implementation of iobuf_pool has two problems: - prealloc of 12.5MB memory, this limits the scale factor of the gluster processes due to RAM requirements - lock contention, as the current implementation has one global iobuf_pool lock. Credits for debugging and addressing the same goes to Krutika Dhananjay <kdhananj@redhat.com>. Issue: #410 Hence changing the iobuf implementation to use per thread mem pool. This may theoritically appear to cause perf dip as there is no preallocation. But per thread mem pool will not have significant perf impact as the last allocated memory is kept alive for subsequent allocs, for some time. The worst case would be if iobufs requested are of random sizes each time. The best case is, if we get iobuf request of the same size. From the perf tests, this patch did not seem to cause any perf decrease. Note that, with this patch, the rdma performance is going to degrade drastically. In one of the previous patchsets we had fixes to not degrade rdma perf, but rdma is not supported and also not tested [1]. Hence the decision was to not have code in rdma that is not tested and not supported. [1] https://lists.gluster.org/pipermail/gluster-users.old/2018-July/034400.html Updates: #325 Change-Id: Ic2ef3bd498f9250dea25f25ba0c01fde19584b27 Signed-off-by: Poornima G <pgurusid@redhat.com>
*	mem-pool: Add api to mem_get based on requested size	Poornima G	2018-12-17	4	-27/+127
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently mem-pool implementation provides api to get from the mem pool based on the struct type. This is to retain api compatibility with the old implementation of mem pool. Internally in the mem pool structure there is a mapping from struct to size based pools. In this patch, we are adding new APIs to fetch memory from mem pool, given a size. Change-Id: Ib220ee45ebd134a7be8f6482db5a592dbb7b9211 Updates: #325 Signed-off-by: Poornima G <pgurusid@redhat.com>
*	fuse: add --lru-limit option	Amar Tumballi	2018-12-14	4	-36/+238
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The inode LRU mechanism is moot in fuse xlator (ie. there is no limit for the LRU list), as fuse inodes are referenced from kernel context, and thus they can only be dropped on request of the kernel. This might results in a high number of passive inodes which are useless for the glusterfs client, causing a significant memory overhead. This change tries to remedy this by extending the LRU semantics and allowing to set a finite limit on the fuse inode LRU. A brief history of problem: When gluster's inode table was designed, fuse didn't have any 'invalidate' method, which means, userspace application could never ask kernel to send a 'forget()' fop, instead had to wait for kernel to send it based on kernel's parameters. Inode table remembers the number of times kernel has cached the inode based on the 'nlookup' parameter. And 'nlookup' field is not used by no other entry points (like server-protocol, gfapi etc). Hence the inode_table of fuse module always has to have lru-limit as '0', which means no limit. GlusterFS always had to keep all inodes in memory as kernel would have had a reference to it. Again, the reason for this is, kernel's glusterfs inode reference was pointer of 'inode_t' structure in glusterfs. As it is a pointer, we could never free it (to prevent segfault, or memory corruption). Solution: In the inode table, handle the prune case of inodes with 'nlookup' differently, and call a 'invalidator' method, which in this case is fuse_invalidate(), and it sends the request to kernel for getting the forget request. When the kernel sends the forget, it means, it has dropped all the reference to the inode, and it will send the forget with the 'nlookup' parameter too. We just need to make sure to reduce the 'nlookup' value we have when we get forget. That automatically cause the relevant prune to happen. Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B fixes: bz#1560969 Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	Multiple posix related files: several modifications	Yaniv Kaul	2018-12-14	2	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Just looked at posix.c and related code and performed some changes and cleanups. The only important one is #3 below, but surely the others (#2 and #4) need careful review. Changes to other files are as they were related to code paths in posix.c. I'll send a separate patch for other posix related files. Main changes: 1. Proper initializtion for parameters, where it made sense. 2. Logged outside the lock in several places. 3. Moved from CALLOC to MALLOC where it made sense. 4. Aligned structures. 5. moved dictionary functions to use _sizen where possible. (dict_get() -> dict_get_sizen() for example) Compile-tested only! Change-Id: Ia84699fb495e06d095339c91c1ba770d1393bb6c updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	xlator: make 'xlator_api' mandatory	Amar Tumballi	2018-12-13	2	-145/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Remove the options to load old symbol. * keep only 'xlator_api' symbol from being exported using xlator.sym * add xlator_api to all the xlators where its missing NOTE: This covers all the xlators which has at least a test case to validate its loading. If there is a translator, which doesn't have any test, then we should probably remove that from codebase. fixes: #164 Change-Id: Ibcdc8c9844cda6b4463d907a15813745d14c1ebb Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	[geo-rep]: Worker still ACTIVE after killing bricks	Mohit Agrawal	2018-12-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In changelog xlator after destroying listener it call's unlink to delete changelog socket file but socket file reference is not cleaned up from process memory Solution: 1) To cleanup reference completely from process memory serialize transport cleanup for changelog and then unlink socket file 2) Brick xlator will notify GF_EVENT_PARENT_DOWN to next xlator only after cleanup all xprts Test: To test the same run below steps 1) Setup some volume and enable brick mux 2) kill anyone brick with gf_attach 3) check changelog socket for specific to killed brick in lsof, it should cleanup completely fixes: bz#1600145 Change-Id: Iba06cbf77d8a87b34a60fce50f6d8c0d427fa491 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>