glusterfs.git/libglusterfs, branch experimental

all: handle USE_AFTER_FREE warnings

2018-12-20T06:26:37+00:00

* we shouldn't be using 'local' after DHT_STACK_UNWIND() as it frees
the content of local. Add a 'goto out' or similar logic to handle
the situation.

* fix possible overlook of unref(dict), instead of unref(xdata).

* make coverity happy by re-ordering unref in meta-defaults.

* gfid-access: re-order dictionary allocation so we don't have to
  do a extra unref.

* other obvious errors reported.

updates: bz#789278
Change-Id: If05961ee946b0c4868df19861d7e4a927a2a2489
Signed-off-by: Amar Tumballi

posix: use synctask for janitor

2018-12-19T14:36:52+00:00

With brick mux, the number of threads increases as the number of
bricks increases. As an initiative to reduce the number of
threads in brick mux scenario, replacing janitor thread to use
synctask infra.

Now close() and closedir() handle by separate janitor
thread which is linked with glusterfs_ctx.

Updates #475
Change-Id: I0c4aaf728125ab7264442fde59f3d08542785f73
Signed-off-by: Poornima G

cluster/afr: Allow lookup on root if it is from ADD_REPLICA_MOUNT

2018-12-18T10:30:19+00:00

Problem: When trying to convert a plain distribute volume to replica-3
or arbiter type it is failing with ENOTCONN error as the lookup on
the root will fail as there is no quorum.

Fix: Allow lookup on root if it is coming from the ADD_REPLICA_MOUNT
which is used while adding bricks to a volume. It will try to set the
pending xattrs for the newly added bricks to allow the heal to happen
in the right direction and avoid data loss scenarios.

Note: This fix will solve the problem of type conversion only in the
case where the volume was mounted at least once. The conversion of
non mounted volumes will still fail since the dht selfheal tries to
set the directory layout will fail as they do that with the PID
GF_CLIENT_PID_NO_ROOT_SQUASH set in the frame->root.

Change-Id: Ic511939981dad118cc946754341318b164954b3b
fixes: bz#1655854
Signed-off-by: karthik-us

iobuf: Get rid of pre allocated iobuf_pool and use per thread mem pool

2018-12-18T09:35:24+00:00

The current implementation of iobuf_pool has two problems:
- prealloc of 12.5MB memory, this limits the scale factor of the gluster
  processes due to RAM requirements
- lock contention, as the current implementation has one global
  iobuf_pool lock. Credits for debugging and addressing the same goes to
  Krutika Dhananjay . Issue: #410

Hence changing the iobuf implementation to use per thread mem pool.
This may theoritically appear to cause perf dip as there is no preallocation.
But per thread mem pool will not have significant perf impact as the last
allocated memory is kept alive for subsequent allocs, for some time.
The worst case would be if iobufs requested are of random sizes each time.
The best case is, if we get iobuf request of the same size. From the perf
tests, this patch did not seem to cause any perf decrease.

Note that, with this patch, the rdma performance is going to degrade
drastically. In one of the previous patchsets we had fixes to not
degrade rdma perf, but rdma is not supported and also not tested [1].
Hence the decision was to not have code in rdma that is not tested
and not supported.

[1] https://lists.gluster.org/pipermail/gluster-users.old/2018-July/034400.html

Updates: #325
Change-Id: Ic2ef3bd498f9250dea25f25ba0c01fde19584b27
Signed-off-by: Poornima G

mem-pool: Add api to mem_get based on requested size

2018-12-17T17:26:59+00:00

Currently mem-pool implementation provides api to get from the
mem pool based on the struct type. This is to retain api
compatibility with the old implementation of mem pool. Internally
in the mem pool structure there is a mapping from struct to size
based pools.

In this patch, we are adding new APIs to fetch memory from mem pool,
given a size.

Change-Id: Ib220ee45ebd134a7be8f6482db5a592dbb7b9211
Updates: #325
Signed-off-by: Poornima G

fuse: add --lru-limit option

2018-12-14T17:34:28+00:00

The inode LRU mechanism is moot in fuse xlator (ie. there is no
limit for the LRU list), as fuse inodes are referenced from
kernel context, and thus they can only be dropped on request of
the kernel. This might results in a high number of passive
inodes which are useless for the glusterfs client, causing a
significant memory overhead.

This change tries to remedy this by extending the LRU semantics
and allowing to set a finite limit on the fuse inode LRU.

A brief history of problem:

When gluster's inode table was designed, fuse didn't have any
'invalidate' method, which means, userspace application could
never ask kernel to send a 'forget()' fop, instead had to wait
for kernel to send it based on kernel's parameters. Inode table
remembers the number of times kernel has cached the inode based
on the 'nlookup' parameter. And 'nlookup' field is not used by
no other entry points (like server-protocol, gfapi etc).

Hence the inode_table of fuse module always has to have lru-limit
as '0', which means no limit. GlusterFS always had to keep all
inodes in memory as kernel would have had a reference to it.
Again, the reason for this is, kernel's glusterfs inode reference
was pointer of 'inode_t' structure in glusterfs. As it is a
pointer, we could never free it (to prevent segfault, or memory
corruption).

Solution:

In the inode table, handle the prune case of inodes with 'nlookup'
differently, and call a 'invalidator' method, which in this case is
fuse_invalidate(), and it sends the request to kernel for getting
the forget request.

When the kernel sends the forget, it means, it has dropped all
the reference to the inode, and it will send the forget with the
'nlookup' parameter too. We just need to make sure to reduce the
'nlookup' value we have when we get forget. That automatically
cause the relevant prune to happen.

Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B

fixes: bz#1560969
Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2
Signed-off-by: Amar Tumballi

Multiple posix related files: several modifications

2018-12-14T04:53:36+00:00

Just looked at posix.c and related code and performed
some changes and cleanups. The only important one is #3 below,
but surely the others (#2 and #4) need careful review.
Changes to other files are as they were related to code paths
in posix.c.

I'll send a separate patch for other posix related files.

Main changes:
1. Proper initializtion for parameters, where it made sense.
2. Logged outside the lock in several places.
3. Moved from CALLOC to MALLOC where it made sense.
4. Aligned structures.
5. moved dictionary functions to use _sizen where possible.
(dict_get() -> dict_get_sizen() for example)

Compile-tested only!

Change-Id: Ia84699fb495e06d095339c91c1ba770d1393bb6c
updates: bz#1193929
Signed-off-by: Yaniv Kaul

xlator: make 'xlator_api' mandatory

2018-12-13T09:11:50+00:00

* Remove the options to load old symbol.
* keep only 'xlator_api' symbol from being exported using xlator.sym
* add xlator_api to all the xlators where its missing

NOTE: This covers all the xlators which has at least a test case
to validate its loading. If there is a translator, which doesn't
have any test, then we should probably remove that from codebase.

fixes: #164
Change-Id: Ibcdc8c9844cda6b4463d907a15813745d14c1ebb
Signed-off-by: Amar Tumballi

[geo-rep]: Worker still ACTIVE after killing bricks

2018-12-13T04:46:50+00:00

Problem: In changelog xlator after destroying listener it call's
         unlink to delete changelog socket file but socket file
         reference is not cleaned up from process memory

Solution: 1) To cleanup reference completely from process memory
             serialize transport cleanup for changelog and then
             unlink socket file
          2) Brick xlator will notify GF_EVENT_PARENT_DOWN to next
             xlator only after cleanup all xprts

Test: To test the same run below steps
      1) Setup some volume and enable brick mux
      2) kill anyone brick with gf_attach
      3) check changelog socket for specific to killed brick
         in lsof, it should cleanup completely

fixes: bz#1600145

Change-Id: Iba06cbf77d8a87b34a60fce50f6d8c0d427fa491
Signed-off-by: Mohit Agrawal

core: move invalid port logs to DEBUG log level

2018-12-12T15:59:56+00:00

Stop spamming "invalid port" logs in case sysadmin has reserved a large
number of ports.

Change-Id: I244ef7693560cc404b36cadc6b05d92ec0e908d3
fixes: bz#1656517
Signed-off-by: Milind Changire