summaryrefslogtreecommitdiffstats
path: root/libglusterfs/src
Commit message (Collapse)AuthorAgeFilesLines
* protocol/server: make listen backlog value as configurableMohammed Rafi KC2017-06-082-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | problem: When we call listen from protocol/server, we are giving a hard coded valie of 10 if it is not manually given. With multiplexing, especially when glusterd restarts all clients may try to connect to the server at a time. Which will result in overflowing the queue, and kernel will complain about the errors. Solution: This patch will introduce a volume set command to make backlog value as a configurable. This patch also changes the default values for backlog from 10 to 128. This changes is only applicable for sockets listening from protocol. Example: gluster volume set <volname> transport.listen-backlog 1024 Note: 1 Brick has to be restarted to get this value in effect 2 This changes won't be reflected in glusterd, or other xlators which calls listen. If you need, you have to add this option to the volfile. Change-Id: I0c5a2bbf28b5db612f9979e7560e05dd82b41477 BUG: 1456405 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: https://review.gluster.org/17411 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* core: add more information on dictionary usageAmar Tumballi2017-06-055-0/+58
| | | | | | | | | | | | | | | | | | | | | | when you take the 'statedump', it shows the output like below ----- [dict] max-number-of-dict-pairs=13 total-pairs-used=41613 total-dict-used=12629 average-pairs-per-dict=3 ------ Updates #220 Change-Id: I71a7eda3a3cd23edf4483234f22f983923bbb081 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://review.gluster.org/4035 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* dict: add a simple hash comparision of keys before strcmp for performanceAmar Tumballi2017-06-012-17/+35
| | | | | | | | | | | | | Updates #220 Change-Id: I03b1d2fac2dfcdd21bdf4e4fff19d49425699931 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://review.gluster.org/6450 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Jeff Darcy <jeff@pl.atyp.us> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* glusterfs: Not able to mount running volume after enable brick mux and ↵Mohit Agrawal2017-05-311-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | stopped any volume Problem: After enabled brick mux if any volume has down and then try ot run mount with running volume , mount command is hung. Solution: After enable brick mux server has shared one data structure server_conf for all associated subvolumes.After down any subvolume in some ungraceful manner (remove brick directory) posix xlator sends GF_EVENT_CHILD_DOWN event to parent xlatros and server notify updates the child_up to false in server_conf.When client is trying to communicate with server through mount it checks conf->child_up and it is FALSE so it throws message "translator are not yet ready". From this patch updated structure server_conf to save child_up status for xlator wise. Another improtant correction from this patch is cleanup threads from server side xlators after stop the volume. BUG: 1453977 Change-Id: Ic54da3f01881b7c9429ce92cc569236eb1d43e0d Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17356 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* libglusterfs: updates old comment for 'arena_size'Ji-Hyeon Gim2017-05-281-3/+3
| | | | | | | | | | | | | | | | the comment (libglusterfs/src/iobuf.h:88-90) for 'arena_size' field in 'struct iobuf_arena' is not valid anymore. According to line 190 in __iobuf_arena_alloc() no longer follows that equation. Change-Id: I68558164b309123cf19093e2da89bc156df294fd BUG: 1455831 Signed-off-by: Ji-Hyeon Gim <potatogim@gluesys.com> Reviewed-on: https://review.gluster.org/17393 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Shyamsundar Ranganathan <srangana@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* libglusterfs : Fix crash in glusterd while peer probingGaurav Yadav2017-05-261-5/+5
| | | | | | | | | | | | | | | | | | | | | | glusterd crashes when port is being set explcitly to a range which is outside greater than short data type range. Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156" In above case glusterd crashes while parsing the port. With this fix glusterd will be able to handle port range between INT_MIN to INT_MAX Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1 BUG: 1454418 Signed-off-by: Gaurav Yadav <gyadav@redhat.com> Reviewed-on: https://review.gluster.org/17359 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* libglusterfs: extract some functionality to functionsCsaba Henk2017-05-233-32/+215
| | | | | | | | | | | | | | | | | | | | | | | | | | - code in run.c to close all file descriptors, except for specified ones is extracted to int close_fds_except (int *fdv, size_t count); - tokenizing and editing a string that consists of comma-separated tokens (as done eg. in mount_param_to_flag() of contrib/fuse/mount.c is abstacted into the following API: char *token_iter_init (char *str, char sep, token_iter_t *tit); gf_boolean_t next_token (char **tokenp, token_iter_t *tit); void drop_token (char *token, token_iter_t *tit); Updates #153 Change-Id: I7cb5bda38f680f08882e2a7ef84f9142ffaa54eb Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/17229 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* cluster/ec: Implement FALLOCATE FOP for ECSunil Kumar Acharya2017-05-231-0/+8
| | | | | | | | | | | | | | | FALLOCATE file operations is not implemented in the existing EC code. This change set implements it for EC. BUG: 1448293 Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/15200 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* nl-cache: In case of nameless operations do not cachePoornima G2017-05-222-0/+16
| | | | | | | | | | | | | | | | | | Issue: In nameless lookup/other fops, parent inode will be NULL, when we try to add the cache to the NULL inode, it causes a crash. Hence handle the scenario of nameless fops, and do not cache/serve the nameless fops. Change-Id: I3b90f882ac89e6aaf3419db89e6f890797f37700 BUG: 1451588 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17316 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterfsd: send PARENT_UP on brick attachAtin Mukherjee2017-05-142-3/+6
| | | | | | | | | | | | | | | | With brick multiplexing being enabled, if a brick is instance attached to a process then a PARENT_UP event is needed so that it reaches right till posix layer and then from posix CHILD_UP event is sent back to all the children. Change-Id: Ic341086adb3bbbde0342af518e1b273dd2f669b9 BUG: 1447389 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17225 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* gfapi: fix handling of dot and double dot in pathMohammed Rafi KC2017-05-142-0/+127
| | | | | | | | | | | | | | | | This patch is to handle "." and ".." in file path. Which means this special dentry names will be resolved before sending fops on the path. Change-Id: I5e92f6d1ad1412bf432eb2488e53fb7731edb013 BUG: 1447266 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: https://review.gluster.org/17177 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* event/epoll: Add back socket for polling of events immediately afterRaghavendra G2017-05-125-48/+109
| | | | | | | | | | | | | | | | | | | | | | | | | reading the entire rpc message from the wire Currently socket is added back for future events after higher layers (rpc, xlators etc) have processed the message. If message processing involves signficant delay (as in writev replies processed by Erasure Coding), performance takes hit. Hence this patch modifies transport/socket to add back the socket for polling of events immediately after reading the entire rpc message, but before notification to higher layers. credits: Thanks to "Kotresh Hiremath Ravishankar" <khiremat@redhat.com> for assitance in fixing a regression in bitrot caused by this patch. Change-Id: I04b6b9d0b51a1cfb86ecac3c3d87a5f388cf5800 BUG: 1448364 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: https://review.gluster.org/15036 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs: fix race condition in client_ctx_setZhou Zhengping2017-05-122-34/+44
| | | | | | | | | | | | | | | | | | | follow procedures: 1.thread1 client_ctx_get return NULL 2.thread 2 client_ctx_set ctx1 ok 3.thread1 client_ctx_set ctx2 ok thread1 use ctx1, thread2 use ctx2 and ctx1 will leak Change-Id: I990b02905edd1b3179323ada56888f852d20f538 BUG: 1449232 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: https://review.gluster.org/17219 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* libglusterfs: include sys/time.h avoid compiling error on MacOSXKinglong Mee2017-05-091-0/+1
| | | | | | | | | | | | Change-Id: I3d30bacc3d5d085220dd85a3919207deef8bd1dd Signed-off-by: Kinglong Mee <mijinlong@open-fs.com> Reviewed-on: https://review.gluster.org/17114 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Tested-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* libglusterfs/graph: fix potential endless loop in options validate workZhou Zhengping2017-05-091-1/+3
| | | | | | | | | | | | Change-Id: Icb71ded6051afe44e07480e0499d2a39f05fac71 BUG: 1447826 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: https://review.gluster.org/17171 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* glusterd: socketfile & pidfile related fixes for brick multiplexing featureMohit Agrawal2017-05-092-10/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: While brick-muliplexing is on after restarting glusterd, CLI is not showing pid of all brick processes in all volumes. Solution: While brick-mux is on all local brick process communicated through one UNIX socket but as per current code (glusterd_brick_start) it is trying to communicate with separate UNIX socket for each volume which is populated based on brick-name and vol-name.Because of multiplexing design only one UNIX socket is opened so it is throwing poller error and not able to fetch correct status of brick process through cli process. To resolve the problem write a new function glusterd_set_socket_filepath_for_mux that will call by glusterd_brick_start to validate about the existence of socketpath. To avoid the continuous EPOLLERR erros in logs update socket_connect code. Test: To reproduce the issue followed below steps 1) Create two distributed volumes(dist1 and dist2) 2) Set cluster.brick-multiplex is on 3) kill glusterd 4) run command gluster v status After apply the patch it shows correct pid for all volumes BUG: 1444596 Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17101 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* libglusterfs: stop special casing "cache-size" in size_t validationCsaba Henk2017-05-081-16/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original situation was as follows: The function that validates xlator options indicating a size, xlator_option_validate_sizet(), handles the case when the name of the option is "cache-size" in a special way. - Xlator options (things of type volume_option_t) has a min and max attribute of type double. - An xlator option is endowed with a gluster specific type (not C type). An instance of an xlator option goes through a validation process by a type specific validator function (which are collected in option.c). - Validators of numeric types - size being one of them - make use the min and max attributes to perform a range check, except in one case: if an option is defined with min = max = 0, then this option will be exempt of range checking. (Note: the volume_option_t definition features the following comments along the min, max fields: double min; /* 0 means no range */ double max; /* 0 means no range */ which is slightly misleading as it lets one to conclude that zeroing min or max buys exemption from low or high boundary check, which is not true -- only *both* being zero buys exemption.) - Besides this, the validator for options of size type, xlator_option_validate_sizet() special cases options named "cache-size" so that only min is enforced. (The only consequence of a value exceeding max is that glusterd logs a warning about it, but the cli user who makes such a setting gets no feedback on it.) - This was introduced because a hard coded limit is not useful for io-cache and quick-read. They rather use a runtime calculated upper limit. (See changes I7dd4d8c53051b89a293696abf1ee8dc237e39a20 I9c744b5ace10604d5a814e6218ca0d83c796db80 about the last two points.) - As an unintended consequence, the upper limit check of cache-size of write-behind, for which a conventional hard coded limit is specified, is defeated. What we do about it: - Remove the special casing clause for cache-size in xlator_option_validate_sizet. Thus the general range check policy (as described above) will apply to cache-size too. - To implement a lower bound only check by the validator for cache-size of io-cache and quick-read, change the max attribute of these options to INFINITY. The only behavioral difference is the omission of the warnings about cache-size of io-cache and quick-read exceeding the former max values. (They were rather heuristic anyway.) BUG: 1445609 Change-Id: I0bd8bd391fa7d926f76e214a2178833fe4673b4a Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/17125 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* rpc: Remove accidental IPV6 changesKaushal M2017-05-051-4/+0
| | | | | | | | | | | | | | They snuck in with the HALO patch (07cc8679c) Change-Id: I8ced6cbb0b49554fc9d348c453d4d5da00f981f6 BUG: 1447953 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: https://review.gluster.org/17174 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* fd: We should use dict_set_uint64 to store fd->pidZhou Zhengping2017-05-041-1/+1
| | | | | | | | | | | | Change-Id: I136372b9929d3ecf243649b6945571a67bfd80eb BUG: 1447828 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: https://review.gluster.org/17172 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* SELinux : implementation of SELinux translatorManikandan Selvaganesh2017-05-031-1/+7
| | | | | | | | | | | | | | | | | | | | The patch implement a part of SELinux translator to support setting SELinux contexts on files in a glusterfs volume. URL: https://github.com/gluster/glusterfs-specs/blob/master/accepted/SELinux-client-support.md Change-Id: Id8916bd8e064ccf74ba86225ead95f86dc5a1a25 BUG: 1318100 Fixes : #55 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/13762 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* Halo Replication feature for AFR translatorKevin Vigor2017-05-028-16/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Halo Geo-replication is a feature which allows Gluster or NFS clients to write locally to their region (as defined by a latency "halo" or threshold if you like), and have their writes asynchronously propagate from their origin to the rest of the cluster. Clients can also write synchronously to the cluster simply by specifying a halo-latency which is very large (e.g. 10seconds) which will include all bricks. In other words, it allows clients to decide at mount time if they desire synchronous or asynchronous IO into a cluster and the cluster can support both of these modes to any number of clients simultaneously. There are a few new volume options due to this feature: halo-shd-latency: The threshold below which self-heal daemons will consider children (bricks) connected. halo-nfsd-latency: The threshold below which NFS daemons will consider children (bricks) connected. halo-latency: The threshold below which all other clients will consider children (bricks) connected. halo-min-replicas: The minimum number of replicas which are to be enforced regardless of latency specified in the above 3 options. If the number of children falls below this threshold the next best (chosen by latency) shall be swapped in. New FUSE mount options: halo-latency & halo-min-replicas: As descripted above. This feature combined with multi-threaded SHD support (D1271745) results in some pretty cool geo-replication possibilities. Operational Notes: - Global consistency is gaurenteed for synchronous clients, this is provided by the existing entry-locking mechanism. - Asynchronous clients on the other hand and merely consistent to their region. Writes & deletes will be protected via entry-locks as usual preventing concurrent writes into files which are undergoing replication. Read operations on the other hand should never block. - Writes are allowed from _any_ region and propagated from the origin to all other regions. The take away from this is care should be taken to ensure multiple writers do not write the same files resulting in a gfid split-brain which will require resolution via split-brain policies (majority, mtime & size). Recommended method for preventing this is using the nfs-auth feature to define which region for each share has RW permissions, tiers not in the origin region should have RO perms. TODO: - Synchronous clients (including the SHD) should choose clients from their own region as preferred sources for reads. Most of the plumbing is in place for this via the child_latency array. - Better GFID split brain handling & better dent type split brain handling (i.e. create a trash can and move the offending files into it). - Tagging in addition to latency as a means of defining which children you wish to synchronously write to Test Plan: - The usual suspects, clang, gcc w/ address sanitizer & valgrind - Prove tests Reviewers: jackl, dph, cjh, meyering Reviewed By: meyering Subscribers: ethanr Differential Revision: https://phabricator.fb.com/D1272053 Tasks: 4117827 Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1 BUG: 1428061 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16099 Reviewed-on: https://review.gluster.org/16177 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* core: make the per glusterfs_ctx_t timer-wheel refcountedNiels de Vos2017-05-017-57/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | xlators can use a 'global' timer-wheel for scheduling events. This timer-wheel is managed per glusterfs_ctx_t, but does not need to be allocated for every graph. When an xlator wants to use the timer-wheel, it will be instanciated on demand, and provided to xlators that request it later on. By adding a reference counter to the glusterfs_ctx_t for the timer-wheel, the threads and structures can be cleaned up when the last xlator does not have a need for it anymore. In general, the xlators request the timer-wheel in init(), and they should return it in fini(). Because the timer-wheel is managed per glusterfs_ctx_t, the functions can be added to ctx.c and do not need to live in their very minimal tw.[ch] files. Change-Id: I19d225b39aaa272d9005ba7adc3104c3764f1572 BUG: 1442788 Reported-by: Poornima G <pgurusid@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/17068 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* dht: send lookup on old name inside rename with bname and pargfidSusant Palai2017-04-291-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Inside rename, a lookup is done on the source name to make sure that the file is there. But we used to do a gfid based lookup and hence, even if the source name was renamed to a new name from some other client, lookup will be successful as server3_3_lookup will fetch the new path based on the gfid. So even if the source file does not exist any more rename will carry on, and as server3_3_link(destination is hashed to a different brick other than source cached scenario) also does gfid based resolve, it wont detect that the source name does not exist and hardlink creation will be successful (since gfid based resolve will get the new dentry). To solve this problem, do a name based lookup inside rename. So that rename will fail right away if the source does not exist. Change-Id: Ieba8bdd6675088dbf18de90ed4622df043d163bd BUG: 1412135 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/16375 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: rebalance perf enhancementSusant Palai2017-04-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Throttle settings "normal" and "aggressive" for rebalance did not have performance difference. normal mode spawns $(no. of cores - 4)/2 threads and aggressive spawns $(no. of cores - 4) threads. Though aggressive mode has twice the number of threads compared to that of normal mode, there was no performance gain when switched to aggressive mode from normal mode. RCA: During the course of debugging the above problem, we tried assigning migration job to migration threads spawned by rebalance, rather than synctasks(as there is more overhead associated to manage the task queue and threads). This gave us a significant improvement over rebalance under synctasks. This patch does not really gurantee that there will be a clear performance difference between normal and aggressive mode, but this patch certainly maximized the disk utilization for 1GBfiles run. Results: Test enviroment: Gluster Config: Number of Bricks: 2 (one brick per disk(RAID-6 12 disk)) Bricks: Brick1: server1:/brick/test1/1 Brick2: server2:/brick/test1/1 Options Reconfigured: performance.readdir-ahead: on server.event-threads: 4 client.event-threads: 4 1000 files with 1GB each were created/renamed such that all files will have server1 as cached and server2 as hashed, so that all files will be migrated. Test machines had 24 cores each. Results with/without synctask based migration: ----------------------------------------------- mode normal(10threads) aggressive(20threads) timetaken 0:55:30 (h:m:s) 0:56:3 (h:m:s) withsynctask timetaken with migrator 0:38:3 (h:m:s) 0:23:41 (h:m:s) threads From above table it can be seen that, there is a clear 2x perf gain between rebalance with synctask vs rebalance with migrator threads. Additionally this patch modifies the code so that caller will have the exact error number returned by dht_migrate_file(earlier the errno meaning was overloaded). This will help avoiding scenarios where migration failure due to ENOENT, can result in rebalance abort/failure. Change-Id: I8904e2fb147419d4a51c1267be11a08ffd52168e BUG: 1420166 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/16427 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* libglusterfs/stack.h: reduce duplication of codeAmar Tumballi2017-04-271-107/+20
| | | | | | | | | | | | | | | | * Use STACK_UNWIND_STRICT everywhere. * Provide STACK_WIND_COMMON as both STACK_WIND_COOKIE and STACK_WIND differ by just 1 line and 1 option. Updates gluster/glusterfs#137 Change-Id: Ifbb6b9c4702b02f4a02834824f509fd10c78f0ce Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://review.gluster.org/16915 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* feature/dht: Directory synchronizationKotresh HR2017-04-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Design doc: https://review.gluster.org/16876 Directory creation is now synchronized with blocking inodelk of the parent on the hashed subvolume followed by the entrylk on the hashed subvolume between dht_mkdir, dht_rmdir, dht_rename_dir and lookup selfheal mkdir. To maintain internal consistency of directories across all subvols of dht, we need locks. Specifically we are interested in: 1. Consistency of layout of a directory. Only one writer should modify the layout at a time. A writer (layout setting during directory heal as part of lookup) shouldn't modify the layout while there are readers (all other fops like create, mkdir etc., which consume layout) and readers shouldn't read the layout while a writer is in progress. Readers can read the layout simultaneously. Writer takes a WRITE inodelk on the directory (whose layout is being modified) across ALL subvols. Reader takes a READ inodelk on the directory (whose layout is being read) on ANY subvol. 2. Consistency of directory namespace across subvols. The path and associated gfid should be same on all subvols. A gfid should not be associated with more than one path on any subvol. All fops that can change directory names (mkdir, rmdir, renamedir, directory creation phase in lookup-heal) takes an entrylk on hashed subvol of the directory. NOTE1: In point 2 above, since dht takes entrylk on hashed subvol of a directory, the transaction itself is a consumer of layout on parent directory. So, the transaction is a reader of parent layout and does an inodelk on parent directory just like any other layout reader. So a mkdir (dir/subdir) would: > Acquire a READ inodelk on "dir" on any subvol. > Acquire an entrylk (dir, "subdir") on hashed subvol of "subdir". > creates directory on hashed subvol and possibly on non-hashed subvols. > UNLOCK (entrylk) > UNLOCK (inodelk) NOTE2: mkdir fop while setting the layout of the directory being created is considered as a reader, but NOT a writer. The reason is for a fop which can consume the layout of a directory to come either of the following conditions has to be true: > mkdir syscall from application has to complete. In this case no need of synchronization. > A lookup issued on the directory racing with mkdir has to complete. Since layout setting by a lookup is considered as a writer, only one of either mkdir or lookup will set the layout. Code re-organization: All the lock related routines are moved to "dht-lock.c" file. New wrapper function is introduced to take blocking inodelk followed by entrylk 'dht_protect_namespace' Updates #191 Change-Id: I01569094dfbe1852de6f586475be79c1ba965a31 Signed-off-by: Kotresh HR <khiremat@redhat.com> BUG: 1443373 Reviewed-on: https://review.gluster.org/15472 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* libglusterfs: accept random volname in glusterfs_graph_prepare()Niels de Vos2017-04-261-27/+52
| | | | | | | | | | | | | | | | | | When the call to glfs_new("volname") passes a name for the volume and it does not match the name of the subvolume in the graph, glfs_init() will fail. This is easily reproducible by a gfapi program that loads the volume from a .vol file, and not from a GlusterD server. Change-Id: I33e77fbee7d12eaefe7c384fad6aecfa3582ea5a BUG: 1425623 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16796 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* remove redundant function definitionJungyeon Yoon2017-04-251-3/+0
| | | | | | | | | | | | | | | | | function definition is duplicated erase second sys_ftruncate() definition No test cases Change-Id: I3eead1380b527b8b0e480f45b39e0c4bc9b2b92a Signed-off-by: Jungyeon Yoon <jungyeon.yoon@gmail.com> Reviewed-on: https://review.gluster.org/17106 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* Implement negative lookup cachePoornima G2017-04-202-0/+7
| | | | | | | | | | | | | | | | | | | | | Before creating any file negative lookups(1 in Fuse, 4 in SMB etc.) are sent to verify if the file already exists. By serving these lookups from the cache when possible, increases the create performance by multiple folds in SMB access and some percentage in Fuse/NFS access. Feature page: https://review.gluster.org/#/c/16436 Updates #82 Change-Id: Ib1c0e7ac7a386f943d84f6398c27f9a03665b2a4 BUG: 1442569 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16952 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* dht: Add readdir-ahead in rebalance graph if parallel-readdir is onPoornima G2017-04-181-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: The value of linkto xattr is generally the name of the dht's next subvol, this requires that the next subvol of dht is not changed for the life time of the volume. But with parallel readdir enabled, the readdir-ahead loaded below dht, is optional. The linkto xattr for first subvol, when: - parallel readdir is enabled : "<volname>-readdir-head-0" - plain distribute volume : "<volname>-client-0" - distribute replicate volume : "<volname>-afr-0" The value of linkto xattr is "<volname>-readdir-head-0" when parallel readdir is enabled, and is "<volname>-client-0" if its disabled. But the dht_lookup takes care of healing if it cannot identify which linkto subvol, the xattr points to. In dht_lookup_cbk, if linkto xattr is found to be "<volname>-client-0" and parallel readdir is enabled, then it cannot understand the value "<volname>-client-0" as it expects "<volname>-readdir-head-0". In that case, dht_lookup_everywhere is issued and then the linkto file is unlinked and recreated with the right linkto xattr. The issue is when parallel readdir is enabled, mount point accesses the file that is currently being migrated. Since rebalance process doesn't have parallel-readdir feature, it expects "<volname>-client-0" where as mount expects "<volname>-readdir-head-0". Thus at some point either the mount or rebalance will fail. Solution: Enable parallel-readdir for rebalance as well and then do not allow enabling/disabling parallel-readdir if rebalance is in progress. Change-Id: I241ab966bdd850e667f7768840540546f5289483 BUG: 1436090 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17056 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* rpc: add options to manage socket keepalive lifespanMilind Changire2017-04-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Default values for handling socket timeouts for brick responses are insufficient for aggressive applications such as databases. Solution: Add 1:1 gluster options for keepalive, keepalive-idle, keepalive-interval and keepalive-timeout as per the socket level options available as per tcp(7) man page. Default values for options are NOT agressive and continue to be values which result in default timeout when only the keep alive option is turned on. These options are Linux specific and will not be applicable to the *BSDs. Change-Id: I2a08ecd949ca8ceb3e090d336ad634341e2dbf14 BUG: 1426059 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: https://review.gluster.org/16731 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* mem-pool: use gf_atomic_t for atomic countersNiels de Vos2017-04-102-8/+13
| | | | | | | | | | | | | | | | Reduce the usage of __sync_fetch_and_add() builtins in mem-pool. The new gf_atomic_t type can be used instead, so that the architecture and compiler specific builtins are hidden from the mem-pool implementation. BUG: 1437037 Change-Id: Icbeeb187dd2b835b35f32f54f821ceddfc7b2638 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/17012 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* xlator: do not call dlclose() when debuggingNiels de Vos2017-04-073-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Valgrind can not show the symbols if a .so after calling dlclose(). The unhelpful ??? in the output gets resolved properly with this change: ==25170== 344 bytes in 1 blocks are definitely lost in loss record 233 of 324 ==25170== at 0x4C29975: calloc (vg_replace_malloc.c:711) ==25170== by 0x52C7C0B: __gf_calloc (mem-pool.c:117) ==25170== by 0x12B0638A: ??? ==25170== by 0x528FCE6: __xlator_init (xlator.c:472) ==25170== by 0x528FE16: xlator_init (xlator.c:498) ==25170== by 0x52DA8D6: glusterfs_graph_init (graph.c:321) ==25170== by 0x52DB587: glusterfs_graph_activate (graph.c:695) ==25170== by 0x5046407: glfs_process_volfp (glfs-mgmt.c:79) ==25170== by 0x5043B9E: glfs_volumes_init (glfs.c:281) ==25170== by 0x5044FEC: glfs_init_common (glfs.c:986) ==25170== by 0x50451A7: glfs_init@@GFAPI_3.4.0 (glfs.c:1031) By not calling dlclose(), the dynamically loaded .so is still available upon program exit, and Valgrind is able to resolve the symbols. This will add an additional leak, so dlclose() is called for normal builds, but skipped when configuring with "./configure --enable-valgrind" or passing the "run-with-valgrind" xlator option. URL: http://valgrind.org/docs/manual/faq.html#faq.unhelpful Change-Id: I2044e21b1b8fcce32ad1a817fdd795218f967731 BUG: 1425623 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16809 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* libglusterfs: provide standardized atomic operationsNiels de Vos2017-04-059-58/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The current macros ATOMIC_INCREMENT() and ATOMIC_DECREMENT() expect a lock as first argument. There are at least two issues with this approach: 1. this lock is unused on architectures that have atomic operations 2. some structures use a single lock for multiple variables By defining a gf_atomic_t type, the unused lock can be removed, saving a few bytes on modern architectures. Because the gf_atomic_t type locates the lock for the variable (in case of older architectures), each variable is protected the same on all architectures. This makes the behaviour across all architectures more equal (per variable locking, by a gf_lock_t or compiler optimization). BUG: 1437037 Change-Id: Ic164892b06ea676e6a9566f8a98b7faf0efe76d6 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16963 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* build: clang has __builtin_popcount() and __builtin_ffs()Kaleb S. KEITHLEY2017-04-051-2/+6
| | | | | | | | | | | | | | | | | Note: Even though gcc(1) will automatically treat ffs() and popcount() as built-in, calling them explicitly as __builtin in the source helps make it easier to find them. (And if no __builtin_ffs use the one in libc.) Change-Id: Ib74d9b221ff03a01df5ad05907024da1a83a7a88 BUG: 1438772 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/16993 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* libglusterfs: add dict_rename_key()Manikandan Selvaganesh2017-03-312-0/+29
| | | | | | | | | | | | | | | | | | | The dict_rename_key() function will be used for converting the "security.selinux" xattr to "trusted.gluster.selinux" in the upcoming SELinux xlator. BUG: 1318100 Change-Id: Ic5d0b9127e2c360d355f02e200a820597e83fa2c Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> [ndevos: split from change Id8916bd8e064ccf74ba86225ead95f86dc5a1a25] Reviewed-on: https://review.gluster.org/16616 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* syncop: don't wake task in synctask_wake unless really neededRavishankar N2017-03-283-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In EC and AFR, we launch synctasks during self-heal. (i) These tasks usually stackwind a FOP to all its children and call synctask_yield() which does a swapcontext to synctask_switchto() and puts the task in syncenv's waitq by calling __wait(task). This happends as long as the FOP ckbs from all children haven't been received. (ii) For each FOP cbk, we call synctask_wake() which again does a swapcontext to synctask_switchto() which now puts the task in syncenv's runq by calling __run(task). When the task runs and the conext switches back to the FOP path, it puts the task in waitq because we haven't heard from all children as explained in (i). Thus we are unnecessarily using the swapcontext syscalls to just toggle the task back and forth between the waitq and runq. Fix: Store the stackwind count in new variable 'syncbarrier->waitfor' before winding the fop. In each cbk when we call synctask_wake(), perform an actual wake only if the cbk count == stackwind count. Change-Id: Id62d3b6ffed5a8c50f8b79267fb34e9470ba5ed5 BUG: 1434274 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/16931 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* syncop: Fix args for makecontextPranith Kumar K2017-03-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Passing 64bit arguments to makecontext is not portable and manpage says the following: <snip> On architectures where int and pointer types are the same size (e.g., x86-32, where both types are 32 bits), you may be able to get away with passing pointers as argu‐ ments to makecontext() following argc. However, doing this is not guaranteed to be portable, is undefined according to the standards, and won't work on architectures where pointers are larger than ints. Nevertheless, starting with version 2.8, glibc makes some changes to makecontext(), to permit this on some 64-bit architectures (e.g., x86-64). </snip> Since we do not depend on the arguments, it is better to change makecontext to not take any arguments. BUG: 1434274 Change-Id: Ic46c9e9faaeb2f78e4efde353ef861466515b1ec Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/16951 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* libglusterfs: Initialize old sigsetRavishankar N2017-03-271-2/+1
| | | | | | | | | | | | | | | | ...in gf_thread_create(). Technically, since we pass it as an argument to pthread_sigmask, initialization is not needed but doing it as a good practice. Change-Id: Ie069af07cb07c1784f3841e1fc628ca13dfdcef4 BUG: 1434274 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/16929 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* core: fix synclocks' handling of "woken" flagJeff Darcy2017-03-231-2/+3
| | | | | | | | | | | | | | | | The "woken" flag wasn't being reset when it should have been, leading (eventually) to a SEGV when someone tried to folow a synclock's waitq to a task structure that had been freed while still on the queue. See the bug report for (far) more detail. Change-Id: I5cd9ae1bcb831555274108b292181ec2a29b6d95 BUG: 1434062 Signed-off-by: Jeff Darcy <jeff@pl.atyp.us> Reviewed-on: https://review.gluster.org/16926 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* refcount: correct the return value of GF_REF_PUT()Niels de Vos2017-03-212-3/+7
| | | | | | | | | | | | | | | | | It is documented that GF_REF_PUT() returns a 0 in case the call resulted in free'ing the structure. However the implementations did not have a return value, so nothing can actually use it. Change-Id: Ic57091f5ddd7e0b80929dc335a5b6d37f5fe1b2e BUG: 1433405 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16910 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* syncop: fix argc count in call to makecontext()Ravishankar N2017-03-211-1/+1
| | | | | | | | | | | | | | We are only passing one argument (a pointer to struct synctask) to the function, so argc must be 1 and not 2. Change-Id: I4eaadd58a76f32327d8bb3efa9c5c435700d7391 BUG: 1434274 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/16930 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* libglusterfs : update correct message segments in glfs-message-idmvignesh@redhat.com2017-03-131-1/+1
| | | | | | | | | | | | | Change-Id: I6611c5699303c879e6f5d88549f5101dd6f63e46 BUG: 1384989 Signed-off-by: Muthu-vigneshwaran <mvignesh@redhat.com> Reviewed-on: https://review.gluster.org/15644 Tested-by: Muthu Vigneshwaran <muthuvigneshwaran77@gmail.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterfsd+libglusterfs: add null checks during attachJeff Darcy2017-03-091-0/+4
| | | | | | | | | | | | | | | | It's possible (though unlikely) that we could get a brick-attach request while we're not ready to process it (ctx->active not set yet). Add code to guard against this possibility, and return appropriate error indicators. Change-Id: Icb3bc52ce749258a3f03cbbbdf4c2320c5c541a0 BUG: 1430860 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/16883 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* posix: use nanosecond accuracy when availableNiklas Hambüchen2017-03-074-6/+49
| | | | | | | | | | | | | | | | | | | | | | | | Programs that set mtime, such as `rsync -a`, don't work correctly on GlusterFS, because it sets the nanoseconds to 000. This creates problems for incremental backups, where files get accidentally copied again and again. For example, consider `myfile` on an ext4 system, being copied to a GlusterFS volume, with `rsync -a` and then `cp -u` in turn. You'd expect that after the first `rsync -a`, `cp -u` agrees that the file need not be copied. BUG: 1422074 Change-Id: I89c7b6a73e2e06c02851ff76b7e5cdfaa271e985 Signed-off-by: Niklas Hambüchen <mail@nh2.me> Reviewed-on: https://review.gluster.org/16667 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Jeff Darcy <jdarcy@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/ec: Introduce optimistic changelog in ECPranith Kumar K2017-03-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Fix to https://bugzilla.redhat.com/show_bug.cgi?id=1316873 has made changes to set dirty flag before every update fop, data or metadata, and unset it after successful operation. That makes some of the fops very slow such as entry operations or metadata operations. Solution: File data operations are the only operation which take some time and setting dirty flag before a fop and unsetting it after serves the purpose as probability of failure of a fop is high when the time duration is more. For all the other operations, set dirty flag at the end of the fop, if any brick is down and need heal. Providing following option to choose between high performance or better heal marking for metadata and entry fops. Set/Unset dirty flag for every update fop at the start of the fop. If ON, this option impacts performance of entry operations or metadata operations as it will set dirty flag at the start and unset it at the end of ALL update fop. If OFF and all the bricks are good, dirty flag will be set at the start only for file fops For metadata and entry fops dirty flag will not be set at the start, if all the bricks are good. This does not impact performance for metadata operations and entry operation but has a very small window to miss marking entry as dirty in case it is required to be healed. Thanks to Xavi and Ashish for the design Picked the .t file from Ashish' patch https://review.gluster.org/16298 BUG: 1408809 Change-Id: I3ce860063f0e2901e50754dcfc3e4ed22daf819f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/16821 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* Free arg_save when malloc failMichael Scherer2017-02-281-0/+1
| | | | | | | | | | | | | | Warning found by coverity. Change-Id: Ie755659c33a43a440dadfeb1499a2f6c08e3f625 BUG: 789278 Signed-off-by: Michael Scherer <misc@redhat.com> Reviewed-on: https://review.gluster.org/16788 Tested-by: Michael Scherer <misc@fedoraproject.org> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/ec: Don't trigger data/metadata heal on LookupsPranith Kumar K2017-02-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem-1 If Lookup which doesn't take any locks observes version mismatch it can't be trusted. If we launch a heal based on this information it will lead to self-heals which will affect I/O performance in the cases where Lookup is wrong. Considering self-heal-daemon and operations on the inode from client which take locks can still trigger heal we can choose to not attempt a heal on Lookup. Problem-2: Fixed spurious failure of tests/bitrot/bug-1373520.t For the issues above, what was happening was that ec_heal_inspect() is preventing 'name' heal to happen Problem-3: tests/basic/ec/ec-background-heals.t To be honest I don't know what the problem was, while fixing the 2 problems above, I made some changes to ec_heal_inspect() and ec_need_heal() after which when I tried to recreate the spurious failure it just didn't happen even after a long time. BUG: 1414287 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Change-Id: Ife2535e1d0b267712973673f6d474e288f3c6834 Reviewed-on: https://review.gluster.org/16468 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com>
* events: use attribute(format(/printf)) to catch fmt string errorsKaleb S. KEITHLEY2017-02-265-9/+11
| | | | | | | | | | | | | and statedump too. Also "const char *" (versus just "char *") for the fmt param. Change-Id: Ic63734a673208a2cd49aebccce7659816e6179e3 BUG: 1399196 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/15881 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* libglusterfs, gfdb, glusterfs: Add missing breaksNigel Babu2017-02-263-0/+7
| | | | | | | | | | | | | | | | A few switches did not have breaks causing fall throughs. Most of them have been fixed with fall through comments for those that are intentional. Change-Id: I84c85726b542f38504b50fefab5eba5dbcd27a07 BUG: 1424894 Signed-off-by: Nigel Babu <nigelb@redhat.com> Reviewed-on: https://review.gluster.org/16677 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>