summaryrefslogtreecommitdiffstats
path: root/libglusterfs/src/glusterfs
Commit message (Collapse)AuthorAgeFilesLines
* mount/fuse: expose auto-invalidation as a mount optionRaghavendra Gowdappa2019-02-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Auto invalidation is necessary when same (meta)data is shared/access across multiple mounts. However, if (meta)data is not shared, all relevant I/O goes through the cache of single mount and hence is coherent with (meta)data on bricks always. So, fuse-auto-invalidation can be disabled for this case which gives a huge performance boost for workloads that write data and then immediately read the data they just wrote. From glusterfs --help, <snip> --auto-invalidation[=BOOL] controls whether fuse-kernel can auto-invalidate attribute, dentry and page-cache. Disable this only if same files/directories are not accessed across two different mounts concurrently [default: "on"] </snip> Details on how disabling auto-invalidation helped to reduce pgbench init times can be found at [1]. Time taken for pgbench init of scale 8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s) with auto-invalidations turned off along with other optimizations. Just disabling auto-invalidation contributed 56% improvement by reducing the total time taken by 33260s. [1] https://www.spinics.net/lists/gluster-devel/msg25907.html Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> updates: bz#1664934
* core: make gf_thread_create() easier to useXavi Hernandez2019-02-011-4/+17
| | | | | | | | | | | | | | This patch creates a specific function to set the thread name using a string format and a variable argument list, like printf(). This function is used to set the thread name from gf_thread_create(), which now accepts a variable argument list to create the full name. It's not necessary anymore to use a local array to build the name of the thread. This is done automatically. Change-Id: Idd8d01fd462c227359b96e98699f8c6d962dc17c Updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* syncop: remove unnecessary call to gf_backtrace_save()Xavi Hernandez2019-01-311-1/+0
| | | | | | | | | | | | A call to gf_backtrace_save() was done on each context switch of a synctask. The backtrace is generated writing to the filesystem, so it can have an important impact on latency. The generated backtrace was not used anywhere, so it's been removed. Change-Id: I399a93b932c5b6e981c696c72c3e1ef44710ba52 Updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* core: heketi-cli is throwing error "target is busy"Mohit Agrawal2019-01-312-3/+2
| | | | | | | | | | | | | | | | | | | | Problem: When rpc-transport-disconnect happens, server_connection_cleanup_flush_cbk() is supposed to call rpc_transport_unref() after open-files on that transport are flushed per transport.But open-fd-count is maintained in bound_xl->fd_count, which can be incremented/decremented cumulatively in server_connection_cleanup() by all transport disconnect paths. So instead of rpc_transport_unref() happening per transport, it ends up doing it only once after all the files on all the transports for the brick are flushed leading to rpc-leaks. Solution: To avoid races maintain fd_cnt at client instead of maintaining on brick Credits: Pranith Kumar Karampuri Change-Id: I6e8ea37a61f82d9aefb227c5b3ab57a7a36850e6 fixes: bz#1668190 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* afr/self-heal:Fix wrong type checkingRavishankar N2019-01-241-0/+3
| | | | | | | | | | gf_dirent struct has d_type variable which should check with DT_DIR istead of IA_IFDIR or IA_IFDIR has to compare with entry->d_stat.ia_type Change-Id: Idf1059ce2a590734bc5b6adaad73604d9a708804 updates: bz#1653359 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* rpc: use address-family option from vol fileMilind Changire2019-01-221-0/+3
| | | | | | | | | | | | | | | | | This patch helps enable IPv6 connections in the cluster. The default address-family is IPv4 without using this option explicitly. When address-family is set to "inet6" in the /etc/glusterfs/glusterd.vol file, the mount command-line also needs to have -o xlator-option="transport.address-family=inet6" added to it. This option also gets added to the brick command-line. Snapshot and gfapi use-cases should also use this option to pass in the inet6 address-family. Change-Id: I97db91021af27bacb6d7578e33ea4817f66d7270 fixes: bz#1635863 Signed-off-by: Milind Changire <mchangir@redhat.com>
* common-utils: make vector a const parameterXiubo Li2019-01-221-1/+1
| | | | | | | | To avoid the warning and preparing for adding writesame support. Updates: #617 Change-Id: I0710b1e4c240368a9bf52968bddc6e250ae2028d Signed-off-by: Xiubo Li <xiubli@redhat.com>
* core: Feature added to accept CidrIp in auth.allowRinku Kothiya2019-01-181-1/+2
| | | | | | | | | | | | | | | Added functionality to gluster volume set auth.allow command to accept CIDR IP addresses. Modified few functions to isolate cidr feature so that it prevents other gluster commands such as peer probe to use cidr format ip. The functions are modified in such a way that they have an option to enable accepting of cidr format for other gluster commands if required in furture. updates: bz#1138841 Change-Id: Ie6734002a7078f1820e5df42d404411cce945e8b Credits: Mohit Agrawal Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* lock: Add fencing supportSusant Palai2019-01-171-0/+2
| | | | | | | | | | | | | | | | | design reference: https://review.gluster.org/#/c/glusterfs-specs/+/21925/ This patch adds the lock preempt support. Note: The current model stores lock enforcement information as separate xattr on disk. There is another effort going in parallel to store this in stat(x) of the file. This patch is self sufficient to add fencing support. Based on the availability of the stat(x) support either I will rebase this patch or we can modify the necessary bits post merging this patch. Change-Id: If4a42f3e0afaee1f66cdb0360ad4e0c005b5b017 updates: #466 Signed-off-by: Susant Palai <spalai@redhat.com>
* core: Resolve dict_leak at the time of destroying graphMohit Agrawal2019-01-141-2/+0
| | | | | | | | | | | | Problem: In gluster code some of the places it call's get_new_dict to create a dictionary without taking reference so at the time of dict_unref it has become a leak Solution: To resolve the same call dict_new instead of get_new_dict updates bz#1650403 Change-Id: I3ccbbf5af07079a4fa09aad2cd0458c8625b2f06 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* Revert "iobuf: Get rid of pre allocated iobuf_pool and use per thread mem pool"Amar Tumballi2019-01-081-14/+97
| | | | | | | | | This reverts commit b87c397091bac6a4a6dec4e45a7671fad4a11770. There seems to be some performance regression with the patch and hence recommended to have it reverted. Updates: #325 Change-Id: Id85d6203173a44fad6cf51d39b3e96f37afcec09
* gfapi: new api glfs_statx as linux's statxShyamsundarR2019-01-071-4/+13
| | | | | | | Change-Id: I44dd6ceef0954ae7fc13f920e84d81bbd3f6a774 Updates: #389 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com> Signed-off-by: ShyamsundarR <srangana@redhat.com>
* libglusterfs/src/mem-types.h: remove unused common enums from mem-types.hYaniv Kaul2018-12-301-136/+98
| | | | | | | | | | | | | | | | | They were not used at all, just taking space. I've also marked all those that are not common really, but used in just one place - they probably should move there (in follow-up patches) As a test, I've removed from the stripe xlator unused private enums and moved one that was in the common list, but only used in the stripe code, to be a private enum. Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I1158dc1d259f1fd3f69904336c46c9d83cea799f
* posix: use synctask for janitorPoornima G2018-12-191-0/+6
| | | | | | | | | | | | | | With brick mux, the number of threads increases as the number of bricks increases. As an initiative to reduce the number of threads in brick mux scenario, replacing janitor thread to use synctask infra. Now close() and closedir() handle by separate janitor thread which is linked with glusterfs_ctx. Updates #475 Change-Id: I0c4aaf728125ab7264442fde59f3d08542785f73 Signed-off-by: Poornima G <pgurusid@redhat.com>
* cluster/afr: Allow lookup on root if it is from ADD_REPLICA_MOUNTkarthik-us2018-12-181-1/+2
| | | | | | | | | | | | | | | | | | | | | Problem: When trying to convert a plain distribute volume to replica-3 or arbiter type it is failing with ENOTCONN error as the lookup on the root will fail as there is no quorum. Fix: Allow lookup on root if it is coming from the ADD_REPLICA_MOUNT which is used while adding bricks to a volume. It will try to set the pending xattrs for the newly added bricks to allow the heal to happen in the right direction and avoid data loss scenarios. Note: This fix will solve the problem of type conversion only in the case where the volume was mounted at least once. The conversion of non mounted volumes will still fail since the dht selfheal tries to set the directory layout will fail as they do that with the PID GF_CLIENT_PID_NO_ROOT_SQUASH set in the frame->root. Change-Id: Ic511939981dad118cc946754341318b164954b3b fixes: bz#1655854 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* iobuf: Get rid of pre allocated iobuf_pool and use per thread mem poolPoornima G2018-12-181-97/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation of iobuf_pool has two problems: - prealloc of 12.5MB memory, this limits the scale factor of the gluster processes due to RAM requirements - lock contention, as the current implementation has one global iobuf_pool lock. Credits for debugging and addressing the same goes to Krutika Dhananjay <kdhananj@redhat.com>. Issue: #410 Hence changing the iobuf implementation to use per thread mem pool. This may theoritically appear to cause perf dip as there is no preallocation. But per thread mem pool will not have significant perf impact as the last allocated memory is kept alive for subsequent allocs, for some time. The worst case would be if iobufs requested are of random sizes each time. The best case is, if we get iobuf request of the same size. From the perf tests, this patch did not seem to cause any perf decrease. Note that, with this patch, the rdma performance is going to degrade drastically. In one of the previous patchsets we had fixes to not degrade rdma perf, but rdma is not supported and also not tested [1]. Hence the decision was to not have code in rdma that is not tested and not supported. [1] https://lists.gluster.org/pipermail/gluster-users.old/2018-July/034400.html Updates: #325 Change-Id: Ic2ef3bd498f9250dea25f25ba0c01fde19584b27 Signed-off-by: Poornima G <pgurusid@redhat.com>
* mem-pool: Add api to mem_get based on requested sizePoornima G2018-12-173-7/+25
| | | | | | | | | | | | | | | Currently mem-pool implementation provides api to get from the mem pool based on the struct type. This is to retain api compatibility with the old implementation of mem pool. Internally in the mem pool structure there is a mapping from struct to size based pools. In this patch, we are adding new APIs to fetch memory from mem pool, given a size. Change-Id: Ib220ee45ebd134a7be8f6482db5a592dbb7b9211 Updates: #325 Signed-off-by: Poornima G <pgurusid@redhat.com>
* fuse: add --lru-limit optionAmar Tumballi2018-12-142-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode LRU mechanism is moot in fuse xlator (ie. there is no limit for the LRU list), as fuse inodes are referenced from kernel context, and thus they can only be dropped on request of the kernel. This might results in a high number of passive inodes which are useless for the glusterfs client, causing a significant memory overhead. This change tries to remedy this by extending the LRU semantics and allowing to set a finite limit on the fuse inode LRU. A brief history of problem: When gluster's inode table was designed, fuse didn't have any 'invalidate' method, which means, userspace application could never ask kernel to send a 'forget()' fop, instead had to wait for kernel to send it based on kernel's parameters. Inode table remembers the number of times kernel has cached the inode based on the 'nlookup' parameter. And 'nlookup' field is not used by no other entry points (like server-protocol, gfapi etc). Hence the inode_table of fuse module always has to have lru-limit as '0', which means no limit. GlusterFS always had to keep all inodes in memory as kernel would have had a reference to it. Again, the reason for this is, kernel's glusterfs inode reference was pointer of 'inode_t' structure in glusterfs. As it is a pointer, we could never free it (to prevent segfault, or memory corruption). Solution: In the inode table, handle the prune case of inodes with 'nlookup' differently, and call a 'invalidator' method, which in this case is fuse_invalidate(), and it sends the request to kernel for getting the forget request. When the kernel sends the forget, it means, it has dropped all the reference to the inode, and it will send the forget with the 'nlookup' parameter too. We just need to make sure to reduce the 'nlookup' value we have when we get forget. That automatically cause the relevant prune to happen. Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B fixes: bz#1560969 Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* Multiple posix related files: several modificationsYaniv Kaul2018-12-141-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | Just looked at posix.c and related code and performed some changes and cleanups. The only important one is #3 below, but surely the others (#2 and #4) need careful review. Changes to other files are as they were related to code paths in posix.c. I'll send a separate patch for other posix related files. Main changes: 1. Proper initializtion for parameters, where it made sense. 2. Logged outside the lock in several places. 3. Moved from CALLOC to MALLOC where it made sense. 4. Aligned structures. 5. moved dictionary functions to use _sizen where possible. (dict_get() -> dict_get_sizen() for example) Compile-tested only! Change-Id: Ia84699fb495e06d095339c91c1ba770d1393bb6c updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* xlator: make 'xlator_api' mandatoryAmar Tumballi2018-12-131-7/+0
| | | | | | | | | | | | | | * Remove the options to load old symbol. * keep only 'xlator_api' symbol from being exported using xlator.sym * add xlator_api to all the xlators where its missing NOTE: This covers all the xlators which has at least a test case to validate its loading. If there is a translator, which doesn't have any test, then we should probably remove that from codebase. fixes: #164 Change-Id: Ibcdc8c9844cda6b4463d907a15813745d14c1ebb Signed-off-by: Amar Tumballi <amarts@redhat.com>
* copy_file_range support in GlusterFSRaghavendra Bhat2018-12-127-3/+149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * libglusterfs changes to add new fop * Fuse changes: - Changes in fuse bridge xlator to receive and send responses * posix changes to perform the op on the backend filesystem * protocol and rpc changes for sending and receiving the fop * gfapi changes for performing the fop * tools: glfs-copy-file-range tool for testing copy_file_range fop - Although, copy_file_range support has been added to the upstream fuse kernel module, no release has been made yet of a kernel which contains the support. It is expected to come in the upcoming release of linux-4.20 So, as of now, executing copy_file_range fop on a fused based filesystem results in fuse kernel module sending read on the source fd and write on the destination fd. Therefore a small gfapi based tool has been written to be able test the copy_file_range fop. This tool is similar (in functionality) to the example program given in copy_file_range man page. So, running regular copy_file_range on a fuse mount point and running gfapi based glfs-copy-file-range tool gives some idea about how fast, the copy_file_range (or reflink) can be. On the local machine this was the result obtained. mount -t glusterfs workstation:new /mnt/glusterfs [root@workstation ~]# cd /mnt/glusterfs/ [root@workstation glusterfs]# ls file [root@workstation glusterfs]# cd [root@workstation ~]# time /tmp/a.out /mnt/glusterfs/file /mnt/glusterfs/new real 0m6.495s user 0m0.000s sys 0m1.439s [root@workstation ~]# time glfs-copy-file-range $(hostname) new /tmp/glfs.log /file /rrr OPEN_SRC: opening /file is success OPEN_DST: opening /rrr is success FSTAT_SRC: fstat on /rrr is success copy_file_range successful real 0m0.309s user 0m0.039s sys 0m0.017s This tool needs following arguments 1) hostname 2) volume name 3) log file path 4) source file path (relative to the gluster volume root) 5) destination file path (relative to the gluster volume root) "glfs-copy-file-range <hostname> <volume> <log file path> <source> <destination>" - Added a testcase as well to run glfs-copy-file-range tool * io-stats changes to capture the fop for profiling * NOTE: - Added conditional check to see whether the copy_file_range syscall is available or not. If not, then return ENOSYS. - Added conditional check for kernel minor version in fuse_kernel.h and fuse-bridge while referring to copy_file_range. And the kernel minor version is kept as it is. i.e. 24. Increment it in future when there is a kernel release which contains the support for copy_file_range fop in fuse kernel module. * The document which contains a writeup on this enhancement can be found at https://docs.google.com/document/d/1BSILbXr_knynNwxSyyu503JoTz5QFM_4suNIh2WwrSc/edit Change-Id: I280069c814dd21ce6ec3be00a884fc24ab692367 updates: #536 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* libglusterfs: Move devel headers under glusterfs directoryShyamsundarR2018-12-0564-0/+14313
libglusterfs devel package headers are referenced in code using include semantics for a program, this while it works can be better especially when dealing with out of tree xlator builds or in general out of tree devel package usage. Towards this, the following changes are done, - moved all devel headers under a glusterfs directory - Included these headers using system header notation <> in all code outside of libglusterfs - Included these headers using own program notation "" within libglusterfs This change although big, is just moving around the headers and making it correct when including these headers from other sources. This helps us correctly include libglusterfs includes without namespace conflicts. Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b Updates: bz#1193929 Signed-off-by: ShyamsundarR <srangana@redhat.com>