summaryrefslogtreecommitdiffstats
path: root/libglusterfs/src
Commit message (Collapse)AuthorAgeFilesLines
* inode: Add per xlator ref count for inodePoornima G2017-01-182-16/+64
| | | | | | | | | | | | | | | | | Debugging inode ref leaks is very difficult as there is no info except for the ref count on the inode. Hence this patch is first step towards debugging inode ref leaks. With this patch, in the statedump we get additional info that tells the ref count taken by each xlator on the inode. Change-Id: I7802f7e7b13c04eb4d41fdf52d5469fd2c2a185a BUG: 1325531 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/13736 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* tier : Tier as a servicehari gowtham2017-01-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tierd is implemented by separating from rebalance process. The commands affected: 1) Attach tier will trigger this process instead of old one 2) tier start and tier start force will also trigger this process. 3) volume status [tier] will show tier daemon as a process instead of task and normal tier status and tier detach status works. 4) tier stop implemented. 5) detach tier implemented separately along with new detach tier status 6) volume tier volname status will work using the changes. 7) volume set works This patch has separated the tier translator from the legacy DHT rebalance code. It now sends the RPCs from the CLI to glusterd separate to the DHT rebalance code. The daemon is now a service, similar to the snapshot daemon, and can be viewed using the volume status command. The code for the validation and commit phase are the same as the earlier tier validation code in DHT rebalance. The “brickop” phase has been changed so that the status command can use this framework. The service management framework is now used. DHT rebalance does not use this framework. This service framework takes care of : *) spawning the daemon, killing it and other such processes. *) volume set options , which are written on the volfile. *) restart and reconfigure functions. Restart is to restart the daemon at two points 1)after gluster goes down and comes up. 2) to stop detach tier. *) reconfigure is used to make immediate volfile changes. By doing this, we don’t restart the daemon. it has the code to rewrite the volfile for topological changes too (which comes into place during add and remove brick). With this patch the log, pid, and volfile are separated and put into respective directories. Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399 BUG: 1313838 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/13365 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* libglusterfs: fix statvfs in FreeBSDXavier Hernandez2017-01-102-1/+37
| | | | | | | | | | | | | | | | | FreeBSD interprets statvfs' f_bsize field in a different way than Linux. This fix modifies the value returned by statvfs() on FreeBSD to match the expected value by Gluster. Change-Id: I930dab6e895671157238146d333e95874ea28a08 BUG: 1356076 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/16361 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* dht: At places needed use STACK_WIND_COOKIEPoornima G2017-01-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: frame has a void * cookie pointer. In case of STACK_WIND_COOKIE frame->cookie is assigned to what is sent by the caller. In case of STACK_WIND frame->cookie is assigned to point point to the frame itself. For ease of coding, at many places, the cookie in the cbk is used to get the pointer to the next xl. This is inconsistent when STACK_WIND_TAIL comes into picture. Eg: dht_setxattr () { for (i = 0 ; i < conf->subvolume_cnt ; i++) { STACK_WIND (..dht_checking_pathinfo_cbk, conf->subvolumes[i] ..); } dht_checking_pathinfo_cbk (...void *cookie...) { prev = cookie; ... for (i = 0; i < conf->subvolume_cnt; i++) { if (conf->subvolumes[i] == prev->this) { ... } } } Consider the below graph: dht (parent) readdir-ahead => Doesn't define setxattr and uses STACK_WIND_TAIL protocol-client With this graph, when dht_checking_pathinfo_cbk is called, cookie will have frame pointing to protocol-client. i.e. prev->this will be protocol-client. But dht was expecting it to be readdir-ahead as it has stored in conf->subvolumes[i] Solution: Hence, as a thumb rule, if cbk is using cookie, then we explicitly call STACK_WIND_COOKIE. Change-Id: I83aea1e24c809c5a91a0db7283e908e125471bd4 BUG: 1401812 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16039 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Get maximum supported op-version in a clusterSamikshan Bairagya2017-01-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gluster volume get <VOLNAME> cluster.opversion gives us the current op-version on which the cluster is operating. There is no command that lets the user know the maximum supported op-version that the cluster can run on. This patch adds a new global option cluster.max-op-version, that can be used to retrieve the maximum supported op-version in a cluster. Usage: # gluster volume get all cluster.max-op-version Example output: Option Value ------ ----- cluster.max-op-version 30900 NOTE: The only way to test this feature for now is to set the GD_OP_VERSION_MAX macro to different values (30800 for 3.8,30900 for 3.9, and so on) and rebuild glusterd. Since the regression test framework currently doesn't have support to simulate these tests, there are no accompanying regression tests for this feature. It should be possible to add tests once glusto comes in and makes it easier to run a heterogeneous cluster. Change-Id: I547480ee5e7912664784643e436feb198b6d16d0 BUG: 1365822 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: http://review.gluster.org/16283 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* posix: make sure atime and mtime are set when calling lutimes()Niels de Vos2017-01-061-0/+12
| | | | | | | | | | | | | | | | | | | | | When overwriting an existing file with O_TRUNC, the 'atime' was set to 0, meaning the Epoch (01-Jan-1970 UTC). However, the 'mtime' gets updated correcty. In case 'atime' or 'mtime' is not passed in the 'struct iatt', the time values passed to the systemcall are taken from the current values are returned by lstat(). Change-Id: I7021b7161dcd6c9a3e515d98f6d4847533c434b3 BUG: 1401777 Reported-by: Eivind Sarto <eivindsarto@gmail.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/16034 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* ec: Invalidations in disperse volume should not update the statPoornima G2017-01-052-4/+2
| | | | | | | | | | | | | | | | | | | | | | Issue: In disperse volume, the file is present across bricks, hence the stat from one brick doesn't carry the valid size of the file. Therefore the upcall from one brick updating the md-cache results in wrong size being updated. Fix: If the notification is cache invalidation then, indicate md-cache that the attributes is invalid. BUG: 1410375 Change-Id: Id89d2283478e70b62b435a8891fffc86d2be8cb2 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16329 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* libglusterfs: serialize init/reconfigure callsJeff Darcy2017-01-053-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | These functions do not generally "expect" to be called more than once in parallel, and many are likely to misbehave in that case (one case in DHT already). Such parallel calls have not generally happened because there are only a few places where we call these functions, and those have been implicitly serialized until recently. However, recent changes in the epoll layer change that, as does brick multiplexing. Therefore, the serialization is now explicit at the init/reconfigure level. It would be sufficient to serialize calls to a particular translator's init and reconfigure functions, but that would require per-translator locks and a bit more complexity in maintaining/using them. Since there's no clear reason why we would need or want to support a higher level of parallelism, the simpler approach of a global lock should suffice. Change-Id: I26296c2826e91dc00b7f0c2061bcc2964ef90c4c BUG: 1399134 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/16030 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* eventsapi: Use `getaddrinfo` instead of `gethostbyname`Aravinda VK2017-01-052-6/+23
| | | | | | | | | | | | | | | `gethostbyname` is not thread safe. Use `getaddrinfo` to avoid any race or segfault while sending events BUG: 1410313 Change-Id: I164af1f8eb72501fb0ed47445e68d896f7c3e908 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/16327 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* core: Disable the memory pooler in Gluster via a build flagShreyas Siravara2017-01-041-2/+16
| | | | | | | | | | | | | | | | | Summary: Passing --disable-mempool to configure will disable the mempool. Change-Id: I60d5f70d54de507fe9f4695d7589f7ae1ba4bb0f BUG: 1405165 Signed-off-by: Shreyas Siravara <sshreyas@fb.com> Reviewed-on: http://review.gluster.org/16148 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Anoop C S <anoopcs@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* performance/readdir-ahead: limit cache sizeRaghavendra G2016-12-223-0/+41
| | | | | | | | | | | | | | | | This patch introduces a new option called "rda-cache-limit", which is the maximum value the entire readdir-ahead cache can grow into. Since, readdir-ahead holds a reference to inode through dentries, this patch also accounts memory stored by various xlators in inode contexts. Change-Id: I84cc0ca812f35e0f9041f8cc71effae53a9e7f99 BUG: 1356960 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/16137 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Fix lk-owner set race in ec_unlockPranith Kumar K2016-12-131-0/+6
| | | | | | | | | | | | | | | | | | | | | | Problem: Rename does two locks. There is a case where when it tries to unlock it sends xattrop of the directory with new version, callback of these two xattrops can be picked up by two separate epoll threads. Both of them will try to set the lk-owner for unlock in parallel on the same frame so one of these unlocks will fail because the lk-owner doesn't match. Fix: Specify the lk-owner which will be set on inodelk frame which will not be over written by any other thread/operation. BUG: 1402710 Change-Id: I666ffc931440dc5253d72df666efe0ef1d73f99a Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/16074 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* refcount: return pointer to the structure instead of a counterNiels de Vos2016-12-112-19/+12
| | | | | | | | | | | | | | | | There are no real users of the counter. It was thought of a handy tool to track and debug refcounting, but it is not used at all. Some parts of the code would benefit from a pointer getting returned instead. BUG: 1399780 Change-Id: I97e52c48420fed61be942ea27ff4849b803eed12 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/15971 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* syncop: fix conditional wait bug in parallel dir scanRavishankar N2016-12-091-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The issue as seen by the user is detailed in the BZ but what is happening is if the no. of items in the wait queue == max-qlen, syncop_mt_dir_scan() does a pthread_cond_wait until the launched synctask workers dequeue the queue. But if for some reason the worker fails, the queue is never emptied due to which further invocations of syncop_mt_dir_scan() are blocked forever. Fix: Made some changes to _dir_scan_job_fn - If a worker encounters error while processing an entry, notify the readdir loop in syncop_mt_dir_scan() of the error but continue to process other entries in the queue, decrementing the qlen as and when we dequeue elements, and ending only when the queue is empty. - If the readdir loop in syncop_mt_dir_scan() gets an error form the worker, stop the readdir+queueing of further entries. Change-Id: I39ce073e01a68c7ff18a0e9227389245a6f75b88 BUG: 1402841 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/16073 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd/geo-rep: Fix glusterd crashKotresh HR2016-12-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: glusterd crashes when geo-rep mountbroker setup is created if the slave user length is more than 8 characters. Cause: _POSIX_LOGIN_NAME_MAX is used which is 9 including NULL byte. Analysis: While the man page says it sufficient for portability, but acutally it's not. Linux allows the creation of username upto 32 characters by default where the max length is 256. And NetBSD's max is 17. Linux: #getconf LOGIN_NAME_MAX 256 NetBSD: #getconf LOGIN_NAME_MAX 17 Fix: Use LOGIN_NAME_MAX instead of _POSIX_LOGIN_NAME_MAX Change-Id: I26b7230433ecbbed6e6914ed39221a478c0266a8 BUG: 1368138 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/16053 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* dht/md-cache: Filter invalidate if the file is made a linkto filePoornima G2016-12-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Upcall as a part of setattr, sends an invalidation and the invalidation carries the resulting stat value. When a file is converted to linkto files, even then an invalidation is set and as a result the mountpoint shows the sticky bit in the stat of the file. eg: ---------T. 945 root root 0 Nov 8 10:14 hardlink.999 Fix: When dht recieves a notification of sticky bit change, it updates the flag, to indicate md-cache to send the subsequent lookup. Change-Id: Ic2fd7a5b196db0754f9b97072e644e6bf69da606 BUG: 1392713 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15789 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* cluster/tier: fix op-version for tier-query-limitMilind Changire2016-12-011-0/+2
| | | | | | | | | | | | | | Correct the op-version for tier-query-limit option from 3.9.0 to 3.9.1 Change-Id: I3a52a94c2708a97c18377e945d559a51d8025c41 BUG: 1366648 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/15990 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* libglusterfs: Fix a read hangPoornima G2016-11-281-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: ===== In certain cases, there was no unwind of read from read-ahead xlator, thus resulting in hang. RCA: ==== In certain cases, ioc_readv() issues STACK_WIND_TAIL() instead of STACK_WIND(). One such case is when inode_ctx for that file is not present (can happen if readdirp was called, and populates md-cache and serves all the lookups from cache). Consider the following graph: ... io-cache (parent) | readdir-ahead | read-ahead ... Below is the code snippet of ioc_readv calling STACK_WIND_TAIL: ioc_readv() { ... if (!inode_ctx) STACK_WIND_TAIL (frame, FIRST_CHILD (frame->this), FIRST_CHILD (frame->this)->fops->readv, fd, size, offset, flags, xdata); /* Ideally, this stack_wind should wind to readdir-ahead:readv() but it winds to read-ahead:readv(). See below for explaination. */ ... } STACK_WIND_TAIL (frame, obj, fn, ...) { frame->this = obj; /* for the above mentioned graph, frame->this will be readdir-ahead * frame->this = FIRST_CHILD (frame->this) i.e. readdir-ahead, which * is as expected */ ... THIS = obj; /* THIS will be read-ahead instead of readdir-ahead!, as obj expands * to "FIRST_CHILD (frame->this)" and frame->this was pointing * to readdir-ahead in the previous statement. */ ... fn (frame, obj, params); /* fn will call read-ahead:readv() instead of readdir-ahead:readv()! * as fn expands to "FIRST_CHILD (frame->this)->fops->readv" and * frame->this was pointing ro readdir-ahead in the first statement */ ... } Thus, the readdir-ahead's readv() implementation will be skipped, and ra_readv() will be called with frame->this = "readdir-ahead" and this = "read-ahead". This can lead to corruption / hang / other problems. But in this perticular case, when 'frame->this' and 'this' passed to ra_readv() doesn't match, it causes ra_readv() to call ra_readv() again!. Thus the logic of read-ahead readv() falls apart and leads to hang. Solution: ========= Modify STACK_WIND_TAIL() as: STACK_WIND_TAIL (frame, obj, fn, ...) { next_xl = obj /* resolve obj as the variables passed in obj macro can be overwritten in the further instrucions */ next_xl_fn = fn /* resolve fn and store in a tmp variable, before modifying any variables */ frame->this = next_xl; ... THIS = next_xl; ... next_xl_fn (frame, next_xl, params); ... } As a part of http://review.gluster.org/15901/ the caller io-cache was fixed. BUG: 1388292 Change-Id: Ie662ac8f18fa16909376f1e59387bc5b886bd0f9 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15923 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* libglusterfs:Now mempool is added to ctx pool list under a lockRajesh Joseph2016-11-221-1/+5
| | | | | | | | | | | | | | | | | | | mempool is added to ctx pool list without any lock. This can cause undefined behaviour in case of multithreaded environment. Fix: modify the list only under ctx->lock Change-Id: I7bdbb3db48a899bb0e41427e149b13c0facaedba Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> BUG: 1394719 Reviewed-on: http://review.gluster.org/15842 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Poornima G <pgurusid@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* afr,dht,ec: Replace GF_EVENT_CHILD_MODIFIED with event SOME_DESCENDENT_DOWN/UPPoornima G2016-11-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently these are few events related to child_up/down: GF_EVENT_CHILD_UP : Issued when any of the protocol client connects. GF_EVENT_CHILD_MODIFIED : Issued by afr/dht/ec GF_EVENT_CHILD_DOWN : Issued when any of the protocol client disconnects. These events get modified at the dht/afr/ec layers. Here is a brief on the same. DHT: - All the subvolumes reported once, and atleast one child came up, then GF_EVENT_CHILD_UP is issued - connect GF_EVENT_CHILD_UP is issued - disconnect GF_EVENT_CHILD_MODIFIED is issued - All the subvolumes disconnected, GF_EVENT_CHILD_DOWN is issued AFR: - First subvolume came up, then GF_EVENT_CHILD_UP is issued - Subsequent subvolumes coming up, results in GF_EVENT_CHILD_MODIFIED - Any of the subvolumes go down, then GF_EVENT_SOME_CHILD_DOWN is issued - Last up subvolume goes down, then GF_EVENT_CHILD_DOWN is issued Until the patch [1] introduced GF_EVENT_SOME_CHILD_UP, GF_EVENT_CHILD_MODIFIED was issued by afr/dht when any of the subvolumes go up or down. Now with md-cache changes, there is a necessity to differentiate between child up and down. Hence, introducing GF_EVENT_SOME_DESCENDENT_DOWN/UP and getting rid of GF_EVENT_CHILD_MODIFIED. [1] http://review.gluster.org/12573 Change-Id: I704140b6598f7ec705493251d2dbc4191c965a58 BUG: 1396038 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15764 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* events: Add FMT_WARN for gf_eventPranith Kumar K2016-11-182-3/+9
| | | | | | | | | | | | | | | | | Raghavendra G found that posix is trying to print %s but passing an int when HEALTH_CHECK fails in posix. These are the kind of bugs that should be caught at compilation itself. Also fixed the problematic gf_event() callers. BUG: 1386097 Change-Id: Id7bd6d9a9690237cec3ca1aefa2aac085e8a1270 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15671 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* system/posix-acl: Log reason for EACCESPranith Kumar K2016-11-174-2/+16
| | | | | | | | | | | | | | | | It is becoming increasingly difficult to debug the reason why posix-acl decides to fail a fop with EACCES. This patch prints a big log everytime such a condition occurs giving out the details that may help in finding why the fop is errored out. Change-Id: I2505baaafb5d77ef6c187554ff027df9b20468db BUG: 1394548 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15837 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* protocol/server: Print error-xlator namePranith Kumar K2016-11-161-0/+22
| | | | | | | | | | | | | | | | | | | | Problem: At the moment from which xlator the errors are stemming from is a mystery. Fix: With this patch we can find on the server side which xlator first gave the errno received by server xlator. I am not yet sure how to get this done for client side which has lot of copy_frame()s. May be another patch. Change-Id: Ie13307b965facf2a496123e81ce0bd6756f98ac9 BUG: 1394548 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15836 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* jbr: Sending rollback from failed fop to fdlAvra Sengupta2016-11-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of a failed fop, the failure is detected by the leader in the jbr-server in two places. First during a quorum check of +ve responses when it receives responses from all the followers. At this point if the fop hasn't been successfully journaled at a quorum of followers (as in there is no merit in trying the fop in the leader as the quorum will never be met), then we fail the fop. Also if this quorum is met, then the fop is tried on the leader, and after the leader completes the fop a quorum check similar to the previous one is done again, this time including the leaders outcome. If quorum is not met, then we fail the fop. In both these cases, when the fop fails we send a -ve ack to the client. With this patch, now we will also send a rollback through a GF_FOP_IPC to all the followers(and also to the leader in the second case of failure). This rollback will contain the index and term number of the fop which failed. This will be recorded in the respective journals of the bricks and will be used to rollback the fop on that brick later. A subsequent write, and it's respective rollback would look something like the following in the journal. The trusted.jbr.term and trusted.jbr.index present in the dict of both the logs, relate them, and the presence of "rollback-fop" in the dict of IPC indicates that it is a rollback fop, and the value 13(stands for GF_FOP_WRITE) indicates what kind of rollback operation it is. === GF_FOP_WRITE fd = <gfid 77f12ea2-ca56-40e3-a46e-ba2308baa035> vector = <158 bytes> offset = 0 (0x0) flags = 32769 (0x8001) xdata = dict { trusted.jbr.term = 0 <2 bytes> trusted.jbr.index = 4 <2 bytes> } === GF_FOP_IPC xdata = dict { trusted.jbr.term = 0 <2 bytes> trusted.jbr.index = 4 <2 bytes> rollback-fop = 13 <3 bytes> } Change-Id: I70b6a143d20697153d58e2f719e34ecd1ed160a5 BUG: 1349385 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/14783 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* features/shard: Fill loc.pargfid too for named lookups on individual shardsKrutika Dhananjay2016-11-082-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a sharded volume when a brick is replaced while IO is going on, named lookup on individual shards as part of read/write was failing with ENOENT on the replaced brick, and as a result AFR initiated name heal in lookup callback. But since pargfid was empty (which is what this patch attempts to fix), the resolution of the shards by protocol/server used to fail and the following pattern of logs was seen: Brick-logs: [2016-11-08 07:41:49.387127] W [MSGID: 115009] [server-resolve.c:566:server_resolve] 0-rep-server: no resolution type for (null) (LOOKUP) [2016-11-08 07:41:49.387157] E [MSGID: 115050] [server-rpc-fops.c:156:server_lookup_cbk] 0-rep-server: 91833: LOOKUP(null) (00000000-0000-0000-0000-000000000000/16d47463-ece5-4b33-9c93-470be918c0f6.82) ==> (Invalid argument) [Invalid argument] Client-logs: [2016-11-08 07:41:27.497687] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.497755] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.498500] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.499680] E [MSGID: 133010] Also, this patch makes AFR by itself choose a non-NULL pargfid even if its ancestors fail to initialize all pargfid placeholders. Change-Id: I5f85b303ede135baaf92e87ec8e09941f5ded6c1 BUG: 1392445 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15788 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/tier: break the monolith processing functionMilind Changire2016-10-261-0/+2
| | | | | | | | | | | | | | Break tier_migrate_using_query_file() into a more manageable tier_migrate_link() and helpers. Change-Id: I5eb2d2cff9e7a2a2da567c3c4c2d53aab195f477 BUG: 1358296 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/14957 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* md-cache, afr: Reduce the window of stale readPoornima G2016-10-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Consider a replica setup, where one mount writes data to a file and the other mount reads the file. In afr, read operations are not transaction based, a brick(read subvolume) is chosen as a part of lookup or other operations, read is always wound only to the read subvolume, even if there was write from a different client that failed on this brick. This stale read continues until there is a lookup or any write operation from the mount point. Currently, this is not a major issue, as a lookup is issued before every read and it will switch the read subvolume to a correct one. But with the plan of increasing md-cache timeout to 600s, the stale read problem will be more pronounced, i.e. stale read can continue for 600s(or more if cascaded with readdirp), as there will be no lookups. Solution: Afr doesn't have any built-in solution for stale read(without affecting the performance). The solution that came up, was to use upcall. When a file on any brick is marked bad for the first time, upcall sends a notification to all the clients that had recently accessed the file. The solution has 2 parts: - Identifying when a file is marked bad, on any of the bricks, for the first time - Client side actions on recieving the notifications Identifying when a file is marked bad on any of the bricks for the first time: ----------------------------------------------------------------------------- The idea is to track xattrop in upcall. xattrop currently comes with 2 afr xattrs - afr dirty bit and afr pending xattrs. Dirty xattr is set to 1 before every write, and is unset if write succeeds. In certain scenarios, dirty xattr can be 0 and still the file could be bad copy. Hence do not track dirty xattr. Pending xattr is set on the good copy, indicating the other bricks that have bad copy. It is still not as simple as, notifying when any of the pending xattrs change. It could lead to flood of notifcations, in case the other brick is completely down or consistantly failing. Hence it is important to notify only once, the first time a good copy is marked bad. Client side actions on recieving pending xattr change, notification: -------------------------------------------------------------------- md-cache will invalidate the cache of that file, so that further lookup is passed down to afr and hence update the read subvolume. Invalidating only in md-cache is not enough, consider the folling oder of opertaions: - pending xattr invalidation - invalidate md-cache - readdirp on the bad read subvolume - fill md-cache - lookup (served from md-cache) - read - wound to the old read subvol. Hence, along with invalidating md-cache, it is very important to reset the read subvolume for that file, in afr. Design Credit: Anuradha Talur, Ravishankar N 1. xattrop doesn't carry info saying post op/pre op. 2. Pre xattrop will have 0 value for all pending xattrs, the cbk of pre xattrop carries the on-disk xattr value. Non zero indicated healing is required. 3. Post xattrop will have non zero value for any of the pending xattrs, if the fop failed on any of the bricks. Change-Id: I469cbc111714c433984fe1c922be2ef113c25804 BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15398 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/tier: handle fast demotionsMilind Changire2016-10-197-49/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Demote files on priority if hi-watermark has been breached and continue to demote until the watermark drops below hi-watermark. Monitor watermark more frequently. Trigger demotion as soon as hi-watermark is breached. Add cluster.tier-emergency-demote-query-limit option to limit number of files returned from the database query for every iteration of tier_migrate_using_query_file(). If watermark hasn't dropped below hi-watermark during the first iteration, the next iteration will be triggered approximately 1 second after tier_demote() returns to the main tiering loop. Update changetimerecorder xlator to handle query for emergency demote mode. Add tier-ctr-interface.h: Move tier and ctr interface specific macros and struct definition from libglusterfs/src/gfdb/gfdb_data_store.h to new header libglusterfs/src/tier-ctr-interface.h Change-Id: If56af78c6c81d37529b9b6e65ae606ba5c99a811 BUG: 1366648 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/15158 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* libglusterfs/client_t: cleanup username and passwd in client_destroy()Vijay Bellur2016-10-111-0/+2
| | | | | | | | | | | | | Thanks to bingxuan.zhang at nokia dot com for the report and patch. Change-Id: If2b2151ab4df27d769126860d98770c80bc8a534 BUG: 1377584 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/15613 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* glusterfsd/main: fix OOM adjustment for older kernelsOleksandr Natalenko2016-10-111-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Milind Changire reported that GlusterFS fails to build on RHEL5 because linux/oom.h is unavailable. Milind's initial patch disables OOM adjustment completely for those environments that do not have this header. However, I'd take another approach that: 1) checks for linux/oom.h in compile-time and defines necessary constants if the header is not present; 2) checks for available OOM API in /proc in run-time and uses it accordingly. This allows OOM to be adjusted properly on RHEL5 (the kernel is pretty new to present /proc API for that) as well as RHEL6 (the kernel has many thing backported including new /proc API). Change-Id: I1bc610586872d208430575c149a7d0c54bd82370 BUG: 1379769 Signed-off-by: Oleksandr Natalenko <onatalen@redhat.com> Reviewed-on: http://review.gluster.org/15587 Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Implement heal info with lockAshish Pandey2016-10-112-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: Currently heal info command prints all the files/directories if the index for the file/directory is present in .glusterfs/indices folder. After implementing patch http://review.gluster.org/#/c/13733/ indices of the file which is going through update fop will also be present in .glusterfs/indices even if the fop is successful on all the brick. At this time if heal info command is being used, it will also display this file which is actually healthy and does not require any heal. Solution: Take lock on a file corresponding to the indices and inspect xattrs to decide if the file needs heal or not. Change-Id: I6361e2813ece369be12d02e74816df4eddb81cfa BUG: 1366815 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/15543 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* afr: fix incorrect debug log in selfheal pathRavishankar N2016-10-041-1/+1
| | | | | | | | | | | | | | | | | | | 1. While looking at glustershd logs in DEBUG log-level, it was found that all bricks of the replica were printed as local bricks even though they were not. Fixed it. 2. Print the name of the subvol from which the entry was got during index crawl. Change-Id: I08b32e38694c755715e9fe0c0e1dd9212abcfb16 BUG: 1381421 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/15610 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: Ignore gluster internal (virtual) xattrs in metadata heal checkRavishankar N2016-09-262-27/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In arbiter configuration, posix-xlator in the arbiter brick always sets the GF_CONTENT_KEY in the response dict with a value 0. If the file size on the data bricks is more than quick-read's max-file-size (64kb default), those bricks don't set the key. Because of this difference in the no. of dict elements, afr triggers metadata heal in lookup code path, in turn leading to extra lookups+inodelks. Fix: Changed afr dict comparison logic to ignore all virtual xattrs and the on-disk ones that we should not be healing. Also removed is_virtual_xattr() function. The original callers to this function (upcall) don't seem to need it anymore. Change-Id: I05730bdd39d8fb0b9a49a5fc9c0bb01f0d3bb308 BUG: 1378684 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/15548 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* tier/ctr/sql : Dafault values for sql cache and wal sizeJoseph Fernandes2016-09-221-2/+2
| | | | | | | | | | | | | | | | Setting default values for sql cache and wal size cache : 12500 pages wal : 25000 pages 1 pages - 4096 bytes Change-Id: Iae3927e021af2e3f7617d45f84e81de3b7d93f1c BUG: 1377864 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/15536 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Milind Changire <mchangir@redhat.com>
* build: out-of-tree builds generates files in the wrong directoryKaleb S KEITHLEY2016-09-182-28/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And minor cleanup of a few of the Makefile.am files while we're at it. Rewrite the make rules to do what xdrgen does. Now we can get rid of xdrgen. Note 1. netbsd6's sed doesn't do -i. Why are we still running smoke tests on netbsd6 and not netbsd7? We barely support netbsd7 as it is. Note 2. Why is/was libgfxdr.so (.../rpc/xdr/src/...) linked with libglusterfs? A cut-and-paste mistake? It has no references to symbols in libglusterfs. Note3. "/#ifndef\|#define\|#endif/" (note the '\'s) is a _basic_ regex that matches the same lines as the _extended_ regex "/#(ifndef|define|endif)/". To match the extended regex sed needs to be run with -r on Linux; with -E on *BSD. However NetBSD's and FreeBSD's sed helpfully also provide -r for compatibility. Using a basic regex avoids having to use a kludge in order to run sed with the correct option on OS X. Note 4. Not copying the bit of xdrgen that inserts copyright/license boilerplate. AFAIK it's silly to pretend that machine generated files like these can be copyrighted or need license boilerplate. The XDR source files have their own copyright and license; and their copyrights are bound to be more up to date than old boilerplate inserted by a script. From what I've seen of other Open Source projects -- e.g. gcc and its C parser files generated by yacc and lex -- IIRC they don't bother to add copyright/license boilerplate to their generated files. It appears that it's a long-standing feature of make (SysV, BSD, gnu) for out-of-tree builds to helpfully pretend that the source files it can find in the VPATH "exist" as if they are in the $cwd. rpcgen doesn't work well in this situation and generates files with "bad" #include directives. E.g. if you `rpcgen ../../../../$srcdir/rpc/xdr/src/glusterfs3-xdr.x`, you get an #include directive in the generated .c file like this: ... #include "../../../../$srcdir/rpc/xdr/src/glusterfs3-xdr.h" ... which (obviously) results in compile errors on out-of-tree build because the (generated) header file doesn't exist at that location. Compared to `rpcgen ./glusterfs3-xdr.x` where you get: ... #include "glusterfs3-xdr.h" ... Which is what we need. We have to resort to some Stupid Make Tricks like the addition of various .PHONY targets to work around the VPATH "help". Warning: When doing an in-tree build, -I$(top_builddir)/rpc/xdr/... looks exactly like -I$(top_srcdir)/rpc/xdr/... Don't be fooled though. And don't delete the -I$(top_builddir)/rpc/xdr/... bits Change-Id: Iba6ab96b2d0a17c5a7e9f92233993b318858b62e BUG: 1330604 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/14085 Tested-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* libglusterfs/mem-pool: fix unused variable warnings/errorsKaleb S. KEITHLEY2016-09-161-1/+0
| | | | | | | | | | | | | | | | | | http://review.gluster.org/14085 fixes a "pragma leak" where the generated rpc/xdr headers have a pair of pragmas that disable these warnings. With the warnings disabled, many unused variables have crept into the code base. And 14085 won't pass its own smoke test until all these warnings are fixed. BUG: 1369124 Change-Id: I8885e4d4aa44307c240413ba6f35ecd59ab45444 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15516 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* libglusterfs: Fix wrong git repository urlAnoop C S2016-09-081-1/+1
| | | | | | | | | | | | Change-Id: Iecd4cfacd7536cb3cc9a569c8d96e5284f5e2152 BUG: 1198849 Signed-off-by: Anoop C S <anoopcs@redhat.com> Reviewed-on: http://review.gluster.org/15423 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* cluster/tier: Adding compaction option for metadata databasesDiogenes Nunez2016-09-046-42/+298
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: As metadata in the database fills up, querying the database take a long time. As a result, tier migration slows down. To counteract this, we added a way to enable the compaction methods of the underlying database. The goal is to reduce the size of the underlying file by eliminating database fragmentation. NOTE: There is currently a bug where sometimes a brick will attempt to activate compaction. This happens even compaction is already turned on. The cause is narrowed down to the compact_mode_switch flipping its value. Changes: libglusterfs/src/gfdb - Added a gfdb function to compact the underlying database, compact_db() This is a no-op if the database has no such option. - Added a compaction function for SQLite3 that does the following 1) Changes the auto_vacuum pragma of the database 2) Compacts the database according to the type of compaction requested - Compaction type can be changed by changing the macro GF_SQL_COMPACT_DEF to one of the 4 compaction types in gfdb_sqlite3.h It is currently set to GF_SQL_COMPACT_INCR, or incremental vacuuming. xlators/cluster/dht/src - Added the following command-line option to enable SQLite3 compaction. gluster volume set <vol-name> tier-compact <off|on> - Added the following command-line option to change the frequency the hot and cold tier are ordered to compact. gluster volume set <vol-name> tier-hot-compact-frequency <int> gluster volume set <vol-name> tier-cold-compact-frequency <int> - tier daemon periodically sends the (new) GFDB_IPC_CTR_SET_COMPACT_PRAGMA IPC to the CTR xlator. The IPC triggers compaction of the database. The inputs are both gf_boolean_t. IPC Input: compact_active: Is compaction currently on for the db. compact_mode_switched: Did we flip the compaction switch recently? IPC Output: 0 if the compaction succeeds. Non-zero otherwise. xlators/features/changetimerecorder/src/ - When the CTR gets the compaction IPC, it launches a thread that will perform the compaction. The IPC ends after the thread is launched. To avoid extra allocations, the parameters are passed using static variables. Change-Id: I5e1433becb9eeff2afe8dcb4a5798977bf5ba0dd Signed-off-by: Diogenes Nunez <dnunez@redhat.com> Reviewed-on: http://review.gluster.org/15031 Reviewed-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* afr: Consume compound fops in afr transactionAnuradha Talur2016-09-011-0/+4
| | | | | | | | | | | | | Change-Id: Ib06ece3cce1b10d28d6d2953da28444f5c2457ad BUG: 1290304 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/15014 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* io-stats: Add stats for upcall notificationsv3.10devPoornima G2016-08-312-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this patch, there will be additional entries seen in the profile info: UPCALL : Total number of upcall events that were sent from the brick(in brick profile), and number of upcall notifications recieved by client(in client profile) Cache invalidation events: ------------------------- CI_IATT : Number of upcalls that were cache invalidation and had one of the IATT_UPDATE_FLAGS set. This indicates that one of the iatt value was changed. CI_XATTR : Number of upcalls that were cache invalidation, and had one of the UP_XATTR or UP_XATTR_RM set. This indicates that an xattr was updated or deleted. CI_RENAME : Number of upcalls that were cache invalidation, resulted by the renaming of a file or directory CI_UNLINK : Number of upcalls that were cache invalidation, resulted by the unlink of a file. CI_FORGET : Number of upcalls that were cache invalidation, resulted by the forget of inode on the server side. Lease events: ------------ LEASE_RECALL : Number of lease recalls sent by the brick (in brick profile), and number of lease recalls recieved by client(in client profile) Note that the sum of CI_IATT, CI_XATTR, CI_RENAME, CI_UNLINK, CI_FORGET, LEASE_RECALL may not be equal to UPCALL. This is because, each cache invalidation can carry multiple flags. Eg: - Every CI_XATTR will have CI_IATT - Every CI_UNLINK will also increment CI_IATT as link count is an iatt attribute. Also UP_PARENT_DENTRY_FLAGS is currently not accounted for, as CI_RENAME and CI_UNLINK will always have the flag UP_PARENT_DENTRY_FLAGS Change-Id: Ieb8cd21dde2c4c7618f12d025a5e5156f9cc0fe9 BUG: 1371543 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15193 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* compound fops: Some fixes to compound fops frameworkAnuradha Talur2016-08-305-24/+132
| | | | | | | | | | | Change-Id: I808fd5f9f002a35bff94d310c5d61a781e49570b BUG: 1360169 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/15010 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht, md-cache, upcall: Add invalidation of IATT when the layout changesPoornima G2016-08-301-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: dht_layout is built as a part of lookup only. The layout can be modified by rebalance process. Since every IO fop is preceded by a lookup, there are very less issues of stale layout. But with enhancements of aggressive caching of stats in md-cache, the lookup will reduce and expose the stale layout issue often. Solution: Since stale layout is already an issue on dht, there is already a plan to fix this at the dht layer, but this fix is not currently planned for any release. Until this fix comes out, we can have a workaround where, the upcall will send a notification to md-cache when a layout xattr is changed. As a part of layout change notification the existing cache is invalidated and the next lookup will fetch the latest layout. This is not a foolproof solution as the window between the layout change and the next lookup(after invalidation of stat), where there will be stale layout. But until the final fix comes in, this reduces the stale layout window. Change-Id: Iacf871a38b35880c1fc0bc68fe7ce291265e71d4 BUG: 1369638 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15300 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* md-cache: Register the list of xattrs with cache-invalidationPoornima G2016-08-303-0/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: md-cache caches a specified list of xattrs, and when cache invalidation is enabled, it makes sense to recieve invalidation only when those xattrs are modified by other clients. But the current implementation of upcall is that, it will send invalidation when any of the on-disk xattrs is modified. Solution: md-cache sends a list of xattrs that it is interested in, to upcall by issuing an ipc(). The challenge here is to make sure everytime a brick goes offline and comes back up, the ipc() needs to be issued to the bricks. Hence ipc() is sent from md-cache every time there is a CHILD_UP/CHILD_MODIFIED event. TODO: There will be patches following, in cluster xlators, to implement ipc fop. Change-Id: I6efcf3df474f5ce6eabd3d6694c00c7bd89bc25d BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15002 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* eventsapi: Add support for Client side EventsAravinda VK2016-08-301-28/+62
| | | | | | | | | | | | | | | | | | | | | | | | | Client side gf_event uses ctx->cmd_args.volfile_server to push notifications to the eventsd. Socket server changed from Unix domain socket to UDP to support external events. Following to be addressed in different patch - Port used for eventsd is 24009. Make it configurable Already configurable in Server side. Configurable in gf_event API is required. - Auth Token yet to be added as discussed in https://www.gluster.org/pipermail/gluster-devel/2016-August/050324.html Change-Id: I159acf80b681d10b82d52cfb3ffdf85cb896542d BUG: 1367774 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15189 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* upcall: Add permission change flag to iatt flagPoornima G2016-08-291-1/+1
| | | | | | | | | | | | | | | | | | | | | Currently IATT_UPDATE_FLAGS is (UP_NLINK | UP_MODE | UP_OWN | UP_SIZE | UP_TIMES | UP_ATIME). But it should also have UP_PERM and permission bits are apart of IATT. This change will have no effect on the existing users as IATT_UPDATE_FLAGS is currently used only by md-cache. And the changes in md-cache to consume cache invalidation is not part of any release yet. Change-Id: I7efad44972add152f5f981d733fb8191dc7ef8ef BUG: 1369432 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15301 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* dht: define GF_IPC_TARGET_UPCALLPrasanna Kumar Kalever2016-08-281-1/+2
| | | | | | | | | | | | | | | | | | In patch http://review.gluster.org/#/c/15225/ GF_IPC_TARGET_UPCALL was used but not defined, because of which build is broken Hence fixing it here. Change-Id: I81e1e63cba25a49fd72a8fd7a3b4a445cadae103 BUG: 1370862 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/15331 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* md-cache: Add cache hit and miss countersPoornima G2016-08-272-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | These counters can be accessed either by .meta interface or statedump. From meta: cat on the private file in md-cache directory. Eg: cat /mnt/glusterfs/0/.meta/graphs/active/patchy-md-cache/private [performance/md-cache.patchy-md-cache] stat_hit_count = 2 stat_miss_count = 8 xattr_hit_count = 4 xattr_miss_count = 3 nameless_lookup_count = 1 negative_lookup_count = 0 stat_invalidations_recieved = 1 xattr_invalidations_recieved = 1 Change-Id: Ib62a8822f263a9f75858b15832d0119fbe382629 BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15185 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd/cli: cli to get local state representation from glusterdSamikshan Bairagya2016-08-263-5/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there is no existing CLI that can be used to get the local state representation of the cluster as maintained in glusterd in a readable as well as parseable format. The CLI added has the following usage: # gluster get-state [daemon] [odir <path/to/output/dir>] [file <filename>] This would dump data points that reflect the local state representation of the cluster as maintained in glusterd (no other daemons are supported as of now) to a file inside the specified output directory. The default output directory and filename is /var/run/gluster and glusterd_state_<timestamp> respectively. The option for specifying the daemon name leaves room to add support for other daemons in the future. Following are the data points captured as of now to represent the state from the local glusterd pov: * Peer: - Primary hostname - uuid - state - connection status - List of hostnames * Volumes: - name, id, transport type, status - counts: bricks, snap, subvol, stripe, arbiter, disperse, redundancy - snapd status - quorum status - tiering related information - rebalance status - replace bricks status - snapshots * Bricks: - Path, hostname (for all bricks these info will be shown) - port, rdma port, status, mount options, filesystem type and signed in status for bricks running locally. * Services: - name, online status for initialised services * Others: - Base port, last allocated port - op-version - MYUUID Change-Id: I4a45cc5407ab92d8afdbbd2098ece851f7e3d618 BUG: 1353156 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: http://review.gluster.org/14873 Reviewed-by: Avra Sengupta <asengupt@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* build: fix eventtypes.h generationPrasanna Kumar Kalever2016-08-261-1/+1
| | | | | | | | | | | | | | | | | make 'eventtypes.py' as dependent i.e. must build eventtypes.h Else the consequence will be like: error: 'EVENT_CLIENT_GRACE_TIMER_START' undeclared (first use in this function) Change-Id: I5fe2491d8d1e0c430b307026695d25475250ae79 BUG: 1370406 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/15327 Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* quotad: fix potential buffer overflowsRaghavendra G2016-08-252-6/+14
| | | | | | | | | | | | | | | | | | | This converts sprintf to gf_asprintf in following components: * quotad.c * dht * afr * protocol/client * rpc/rpc-lib * rpc/rpc-transport Change-Id: If8a267bab3d91003bdef3a92664077a0136745ee BUG: 1332073 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/14102 Tested-by: Manikandan Selvaganesh <mselvaga@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com>