summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* features/bitrot: Throttle filesystem scrubberVenky Shankar2015-05-079-75/+760
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces multithreaded filesystem scrubber based on throttling option configured for a particular volume. The implementation "logically" breaks scanning and scrubbing with the number of scrubber threads auto-configured depending upon the throttle configuration. Scanning (crawling) is left single threaded (per brick) with entries scrubbed in bulk. On reaching this "bulk" watermark, scanner waits until entries are scrubbed. Bricks for a particular volume have a set of thread(s) assigned for scrubbing, with entries for each brick scrubbed in a round robin fashion to avoid scrub "stalls" when a brick (out of N bricks) is under active scrubbing. This mechanism helps us implement "pause/resume" with ease: all one need to do is to cleanup scrubber threads and let the main scanner thread "wait" untill scrubbing is resumed (where the scrubber thread(s) are spawned again), therefore continuing where we left off (unless we restart the deamons, where crawl initiates from root directory again, but I guess that's OK). [ NOTE: Throttling is optional for the signer daemon, without which it runs full throttle. However, passing "-DBR_RATE_LIMIT_SIGNER" predefined in CFLAGS enables CPU throttling (during checksum calculation) thereby avoiding high CPU usage. ] Subsequent patches would introduce CPU throttling during hash calculation for scrubber. Change-Id: I5701dd6cd4dff27ca3144ac5e3798a2216b39d4f BUG: 1207020 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10511 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Documentation for tiering feature. (WIP)Dan Lambright2015-05-071-0/+118
| | | | | | | | | | | This is a WIP. Change-Id: Ia36f77d158a370f77cb866a32308b27e10d39b5e BUG: 1218638 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10656 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
* tiering/cli: while attaching tier to volume cli should ask questionGaurav Kumar Garg2015-05-071-1/+10
| | | | | | | | | | | | | | Because of tiering feature is recommended only for testing purpose in this release, attaching tier should get a confirmation from the user before proceeding with the command. Change-Id: I141bbb1d0439f0a28eb51d17f7800908e35c75ad BUG: 1219032 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10610 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/shard: Implement [f]truncate fopsKrutika Dhananjay2015-05-073-309/+804
| | | | | | | | | | | | | | To-Do: * Make ftruncate work even in the absence of path * Aggregate and update ia_blocks appropriately when a file is truncated to a lower size. Change-Id: Ifd24c2f5e80d2c3bc921261f5481251df8948126 BUG: 1207615 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10631 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* nfs: allocate the auth_cache->cache_dict on auth_cache_init()Niels de Vos2015-05-071-14/+10
| | | | | | | | | | | | | | | | It seems possible that auth_cache->cache_dict is not always allocated before it is accessed. Instead of allocating the dict upon the 1st access, just create it in auth_cache_init(). Change-Id: I00e60522478b433cb0aae0c1f0948eac544dfd2b URL: http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10710 BUG: 1143880 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10600 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Tested-by: NetBSD Build System
* glupy: fix tuntime search path and python module directory layoutEmmanuel Dreyfus2015-05-074-4/+16
| | | | | | | | | | | | | | | | | | | | 1) The glupy.so xlator should embed the runtime search path for the python libraries. Unfortunately, python-config does not gives the appprioate flags, therefore we need to also use pkg-config to obtain them 2) Fix the glupy python module directory layout so that python can import the module without problem That two fixes seems to let glupy.t pass on NetBSD again. BUG: 1129939 Change-Id: I397aa726ab8bf7d91fa0d6d870a30910a5f4a5d9 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10616 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* Restore build on non Linux systemsEmmanuel Dreyfus2015-05-072-4/+7
| | | | | | | | | | | | | | | | | | | This change broke the build on NetBSD, FreeBSD, and MacOS X: http://review.gluster.org/10526/ We restore the build with two fixes: - Use POSIX-compliant sysconf(_SC_NPROCESSORS_ONLN) to get the number of processors, instead of Linux specific get_nprocs(). That let us remove Linux-specific #include <sys/sysinfo.h> - Only define MAX() if it is not already defined. NetBSD defines it in <sys/param.h> which is already included BUG: 1129939 Change-Id: I62341c670598670e47ea2f69ab94864f96588b18 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10652 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* NFS-Ganesha : Improved sed expression in cleanupMeghana Madhusudhan2015-05-071-1/+1
| | | | | | | | | | | | | | | The global config file should not be emptied when tear down is called. Improving the sed expression to delete the ".conf" files only Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com> Change-Id: Ida3d303f629cb512c02dadacce1ec7e5f07db018 BUG: 1218854 Reviewed-on: http://review.gluster.org/10630 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: soumya k <skoduri@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/bit-rot: Token Bucket based throttlingVenky Shankar2015-05-076-9/+432
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BitRot daemons (signer & scrubber) are disk/cpu hoggers when left running full throttle. Checksum calculations (especially SHA family of hash routines) can be quite CPU intensive. Moreover periodic disk scans performed by scrubber followed by reading data blocks for hash calculation (which is also done by signer) generate lot of heavy IO request(s). This causes interference with actual client operations (be it a regular client or filesystems daemons such as self-heal, etc..) and results in degraded system performance. This patch introduces throttling based on Token Bucket Filtering[1]. It's a well known algorithm for checking (and ensuring) that data transmission conform to defined limits and generally used in packet switched networks. Linux control groups (Cgroups) uses a variant[2] of this algorithm to provide block device IO throttling (cgroup subsys "blkio": blk-iothrottle). So, why not just live with Cgroups? Cgroups is linux specific. We need to have a throttling mechanism for other supported UNIXes. Moreover, having our own implementation gives much more finer control in terms of tuning it for our needs (plus the simplicity of the alogorithm itself). Ideally, throttling should be a part of server stack (either as a separate translator or integrated with io-threads) since that's the point of entry for IO request(s) from *all* client(s). That way one could selectively throttle IO request(s) based on client PIDs (frame->root->pid), e.g., self-heal daemon, bitrot, etc.. (*actual* clients can run full throttle). This implementation avoids that deliberately (there needs to be a much more smarter queueing mechanism) and throttles CPU usage for hash calculations. This patch is just the infrastructure part with no interfaces exposed to set various throttling values. The tunable selected here (basically hardcoded) avoids 100% CPU usage during hash calculation (with some bursts cycles). We'd need much more intensive test(s) to assign values for various throttling options (lazy/normal/aggressive). [1] https://en.wikipedia.org/wiki/Token_bucket [2] http://en.wikipedia.org/wiki/Token_bucket#Hierarchical_token_bucket Change-Id: Icc49af80eeab6adb60166d0810e69ef37cfe2fd8 BUG: 1207020 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10307 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* Tests: use a portable way to flush kernel cacheEmmanuel Dreyfus2015-05-073-16/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Linux, kernel cache can be flushed using echo 3 > /proc/sys/vm/drop_caches This non-portable approach can be replaced by an on-purpose failed attempt to unmount: if the mount point is the current directory and umount is called, the kernel will flush inodes until it realize it cannot complete the operation because root of filesystem is busy: ( cd $M0 ; umount $M0 ) Unfortunately this does not flush everything. Entries may still be present in the kenrel FUSE cache. Using $GFS to mount the filesystem ensure --entry-timeout=0 and clears this problem. Some stall information may also remain in glusterfs caches, and that may have to be adressed by appropriate volume option. For instance tests/bugs/rpc/bug-954057.t needs to disable performance.stat-prefetch. Qtherwise, root's new credentials are not evaluated after root-quash is enabled. The test could also be done with performance.stat-prefetch enabled using various tricks: copying the file to read, creating a hard link on it, or just waiting long enough for metadata cache to expire. BUG: 1129939 Change-Id: I54929e899d55c04dcd9d947809133549f01fd0e1 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10411 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Fix Default values in Xml outputAravinda VK2015-05-071-0/+9
| | | | | | | | | | | | | | Default Values for last_synced, checkpoint_time and checkpoint_completion_time was zero instead of 'N/A' BUG: 1212410 Change-Id: Ie775508f8dcb9ba6f311946a2039739e4336d9a6 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10580 Reviewed-by: darshan n <dnarayan@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tools/glusterfind: Rpm install dependencies addedAravinda VK2015-05-071-0/+4
| | | | | | | | | | | | | | | | Argparse Python library is available as standard library in Python 2.7 In rhel, we need to install python-argparse. Also added pyxattr dependency to the spec file. Change-Id: Ie50278eddf81e7cbf98d1e8d27e411b46676b432 Signed-off-by: Aravinda VK <avishwan@redhat.com> BUG: 1209138 Reviewed-on: http://review.gluster.org/10321 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Fix Rsync hang issueAravinda VK2015-05-071-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | When rsync is executed using Python subprocess, by default stdout of subprocess will be None. With the log rsync performance patch stdout is assigned to PIPE. Rsync writes to that PIPE whenever it syncs files. If log_rsync_performance is disabled then nobody will consume stdout and that gets full. Rsync hangs if PIPE is full. log_rsync_performance option is introduced with patch 10070 With this patch stdout=PIPE only if log_rsync_performance is enabled. Also removed -v option from Rsync. Thanks Venky and Kotresh for RCA. BUG: 1218552 Change-Id: I4ebcfb6999358c8e2c147f7964255bd836ed7499 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10556 Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* NFS-Ganesha: Add-node and delete-node changes in HA implementationMeghana Madhusudhan2015-05-071-4/+60
| | | | | | | | | | | | | | | NFS-Ganesha related config files have to be copied over to the new node and NFS-Ganesha service has to be started. Similary NFS-Ganesha service has to be stopped when a node is deleted from the HA cluster. Change-Id: Ia38e72cac86713fe23b7d1b829a256637a9ca796 BUG: 1212816 Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com> Reviewed-on: http://review.gluster.org/10596 Reviewed-by: soumya k <skoduri@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* snapshot: Handshake with glusterd is not properMohammed Rafi KC2015-05-073-20/+167
| | | | | | | | | | | | | | | | | | | | If a snap is activated or deactivated, when a node is down, it is not retrieving the data properly during the handshake of glusterd With this patch, a version check will made when a glusterd is started running. If there is a mismach in version, then peers will exchange the healed data. Change-Id: I8bd2a347723db2194d3fa73295878b4dd2e9be5d BUG: 1122377 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/9664 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: NetBSD Build System Reviewed-by: Kaushal M <kaushal@redhat.com>
* tests: Spurious failure in fop-sanity.tNithya Balachandran2015-05-073-18/+21
| | | | | | | | | | | | | | | | | | Modified the calls to open in fops-sanity.c to pass in the mode as well if flags includes O_CREAT (as per man page). The missing mode randomly caused T files to be created causing DHT to treat them as linkto files and fail the fop. Modified 2 other files where the mode was not being provided. Change-Id: I047573d43655b4957d0703f7df36238f7e729c1f BUG: 1218951 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10590 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* stripe: set ENOENT when a READ hits EOFNiels de Vos2015-05-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | | The NFS-server sets EOF only in the READ reply when op_errno is set to ENOENT. Xlators are expected to set op_errno to ENOENT when EOF is reached, op_ret will contain the number of bytes returned by the READ. When an NFS-client (like VMware ESXi) do a READ that exceeds the size of the file, errno should be set to EOF and the return value contains the number of bytes that are read (from the requested offset, until the end of the file). Not setting EOF on a correct short READ, can result in errors on the NFS-client. This is not an issue with the Linux NFS-client (or VFS). Linux is smart enough to not try to read more bytes than the file contains. BUG: 1209298 Change-Id: Ib15538744908a6001d729288d3e18a432d19050b Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10142 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
* dht/rebalance: Throttle rebalanceSusant Palai2015-05-075-4/+211
| | | | | | | | | | | | Throttle value will be "normal" by default. For throttling down, a thread will be put in to sleep. And for throttling up, gf_defrag_process_dir will wake up the sleeping threads. Change-Id: I74d530e3effd6e60e6eec81ccc8ff65789fa9c13 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/10526 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/afr : Prevent inode-evict during split-brain resolutionAnuradha2015-05-078-58/+316
| | | | | | | | | | | | | | | | | | | | | 1) Provided setfattr command to set timeout for split-brain choice. 2) If split-brain inspection/resolution is being done from the mount for a file, ref the inode when split-brain-choice is set. This inode will be unconditionally unref-ed after timeout seconds set by the user/default otherwise. 3) Updated the doc and testcase to reflect the changes. Change-Id: I15c9037dee28855f21e680e7e3632e1f48dba4e1 BUG: 1209104 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10134 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* bitrot/glusterd: Bitrot scrub pause/resume should give proper errorGaurav Kumar Garg2015-05-072-6/+77
| | | | | | | | | | | | | | bitrot scrubber paused/resume command should give proper error messages if scrubber already pause/resume and user again try to perform same operation on a volume. Change-Id: I01ad69c80f03b177535a4e5f1c95ab7709a804b0 BUG: 1210684 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10209 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* glusterd: Use generation number to find peerinfo in RPC notificationsKaushal M2015-05-075-8/+53
| | | | | | | | | | | | | | | | | | The generation number for each peerinfo object is unique. It can be used to find the exact peerinfo object, which is required for peer RPC notifications. Using hostname and uuid matching to find peerinfos can return incorrect peerinfos to be returned in certain cases like multi network peer probe. This could cause updates to happen to incorrect peerinfos. Change-Id: Ia0aada8214fd6d43381e5afd282e08d53a277251 BUG: 1215018 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/10495 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: remove replace brick with data migration support form cli/glusterdGaurav Kumar Garg2015-05-0722-2218/+168
| | | | | | | | | | | | | | | | Replace-brick operation with data migration support have been deprecated from gluster. With this fix replace brick command will support only one commad gluster volume replace-brick <VOLNAME> <SOURCE-BRICK> <NEW-BRICK> {commit force} Change-Id: Ib81d49e5d8e7eaa4ccb5830cfec2bc081191b43b BUG: 1094119 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10101 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* features/changelog: Version support for Changelog ParserAravinda VK2015-05-062-11/+70
| | | | | | | | | | | | | | | | | | As optional feature, during unlink, full path will be recorded. Changelog Version number to be bumped up to 1.2. With this patch, parser checks the version number before parsing and handles accordingly. Change-Id: Ic1ad98259c39e417029a08e26a1d4b467817e65a BUG: 1214561 Signed-off-by: Aravinda VK <avishwan@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10166 Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* cluster/ec: add separate versions for data/entry, metadataAshish Pandey2015-05-068-29/+104
| | | | | | | | | | | | | | | | | | | Adding 64 bits in "version" key of extended attributes. First 64 bits (Left) represents Data version. Last 64 bits (right) represents Meta Data version. Note: 3.7 and 3.6 version ec can't co-exist with this change because xattrop in 3.6 will fail with ERANGE as the buffer passed to it will be '8' bytes where as the value will be 16 bytes in 3.7. Where as 3.7 version clients can work with old version files. For upgrades we need to tell users to complete heals and then upgrade BUG: 1215265 Change-Id: Ib85114680cb7e75b8371c984d9f7b6401c1ffb93 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/10312 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* quota/marker: turn off inode quotas by defaultvmallika2015-05-0613-42/+289
| | | | | | | | | | | | | | | | | | | | | inode quota is a new feature implemented in glusterfs-3.7 if quota is enabled in the older version and is upgraded to a new version, we can hit setxattr spike during self-heal of inode quotas. So, when a quota is enabled, turn off inode-quotas with a xlator option. With this patch, we still account for inode quotas but only when a write operation is performed for a particular file. User will be able to query inode quotas once the Inode-quota xlator option is enabled. Change-Id: I52fb28bf7024989ce7bb08ac63a303bf3ec1ec9a BUG: 1209430 Signed-off-by: vmallika <vmallika@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/10152 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* NFS-Ganesha : Do not empty global config file when cluster is tornMeghana Madhusudhan2015-05-061-1/+1
| | | | | | | | | | | | | | | The global config file will need new blocks to support client lock recovery. Current cleanup function empties the entire file. Deleting only "include" lines in the config file. Change-Id: I21f09e30a738d2ba01861ce480ecf906667d887b BUG: 1218854 Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com> Reviewed-on: http://review.gluster.org/10592 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* NFS-Ganesha : Locking global options fileMeghana Madhusudhan2015-05-066-19/+111
| | | | | | | | | | | | | | | | | | | | | Global option gluster features.ganesha enable writes into the global 'option' file. The snapshot feature also writes into the same file. To handle concurrent multiple transactions correctly, a new lock has to be introduced on this file. Every operation using this file needs to contest for the new lock type. Change-Id: Ia8a324d2a466717b39f2700599edd9f345b939a9 BUG: 1200254 Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com> Reviewed-on: http://review.gluster.org/10130 Reviewed-by: Avra Sengupta <asengupt@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* doc/admin-guide: BitRot command line usageVenky Shankar2015-05-061-0/+58
| | | | | | | | | Change-Id: Ic3c9da566673f4d9b1d0974c5aa15121da087fba BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10607 Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com> Tested-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
* ctr/xlator: Named lookup heal of pre-existing files, before ctr was ON.Joseph Fernandes2015-05-068-6/+1021
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The CTR xlator records file meta (heat/hardlinks) into the data. This works fine for files which are created after ctr xlator is switched ON. But for files which were created before CTR xlator is ON, CTR xlator is not able to record either of the meta i.e heat or hardlinks. Thus making those files immune to promotions/demotions. Solution: The solution that is implemented in this patch is do ctr-db heal of all those pre-existent files, using named lookup. For this purpose we use the inode-xlator context variable option in gluster. The inode-xlator context variable for ctr xlator will have the following, a. A Lock for the context variable b. A hardlink list: This list represents the successful looked up hardlinks. These are the scenarios when the hardlink list is updated: 1) Named-Lookup: Whenever a named lookup happens on a file, in the wind path we copy all required hardlink and inode information to ctr_db_record structure, which resides in the frame->local variable. We dont update the database in wind. During the unwind, we read the information from the ctr_db_record and , Check if the inode context variable is created, if not we create it. Check if the hard link is there in the hardlink list. If its not there we add it to the list and send a update to the database using libgfdb. Please note: The database transaction can fail(and we ignore) as there already might be a record in the db. This update to the db is to heal if its not there. If its there in the list we ignore it. 2) Inode Forget: Whenever an inode forget hits we clear the hardlink list in the inode context variable and delete the inode context variable. Please note: An inode forget may happen for two reason, a. when the inode is delete. b. the in-memory inode is evicted from the inode table due to cache limits. 3) create: whenever a create happens we create the inode context variable and add the hardlink. The database updation is done as usual by ctr. 4) link: whenever a hardlink is created for the inode, we create the inode context variable, if not present, and add the hardlink to the list. 5) unlink: whenever a unlink happens we delete the hardlink from the list. 6) mknod: same as create. 7) rename: whenever a rename happens we update the hardlink in list. if the hardlink was not present for updation, we add the hardlink to the list. What is pending: 1) This solution will only work for named lookups. 2) We dont track afr-self-heal/dht-rebalancer traffic for healing. Change-Id: Ia4bbaf84128ad6ce8c3ddd70bcfa82894c79585f BUG: 1212037 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10370 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* nfs : fix for coredump caused by export/netgroup feature in the regressionJiffin Tony Thottan2015-05-061-0/+1
| | | | | | | | | Change-Id: Idaae234b9e81c40040393e748db1f61363a48ed0 BUG: 1211913 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/10250 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* features/bitrot: Follow xattr naming conventionsVenky Shankar2015-05-065-7/+9
| | | | | | | | | | | | | | | | | Instead of "trusted.glusterfs.bit-rot.*" use "trusted.bit-rot.*" NOTE: With this patch, data on existing volumes would be resigned (which should be OK as of now since we do not expect many users as of now :-)) Change-Id: I926c7bca266a9c8f2cb35d57c4d0359aa5cecfa0 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10181 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht/rebalance: Go to exit if defrag is NULLSusant Palai2015-05-051-5/+5
| | | | | | | | | | Problem: In case a defrag is null, going to out section will crash rebalance. Change-Id: I8b3ee1ad85dc23ef0e2f2dd6f912d07216bd619f Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/10582 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com>
* features/shard: Implement readv() fopKrutika Dhananjay2015-05-052-269/+635
| | | | | | | | | Change-Id: I4cc060710482de8633141170dd35f669f01f639b BUG: 1207615 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10528 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* libgfapi : anonymous fd supportJiffin Tony Thottan2015-05-059-1/+354
| | | | | | | | | | | | | | | | Anonymous fd's are floating fd assigned to a glusterfs client without a explicit file open. Here either it will create a new anonymous fd or existing anonymous fd in the client stack for requested file.The anonymous fd's are mainly used for IO's. This patch introduces two api's glfs_h_anonymous_read and glfs_h_anonymous_write which performs read and write respectively Change-Id: Id646f2220e8387b2f8bb244c848dc1db6761444f BUG: 1204651 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9971 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Cache-invalidation : set to on/off depending on ganesha.enable valueMeghana Madhusudhan2015-05-052-1/+15
| | | | | | | | | | | | | | | | Multi-Head NFS-Ganesha servers need upcall (cache-invalidation) support to notify them in case of any changes to the files in the backend. Hence, upcall xlator option "features.cache-invalidation" needs to be enabled when ganesha.enable is set to 'on'. Similarly, this feature needs to be disabled when ganesha.enable is set to 'off' Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com> Change-Id: Ifdd1d50e48a2bd2a388f73c0b9e318c6092ac190 BUG: 1213752 Reviewed-on: http://review.gluster.org/10581 Reviewed-by: soumya k <skoduri@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* api/libgfapi: Add support for statfs handleMohammed Rafi KC2015-05-054-0/+57
| | | | | | | | | | | | | snapd feature require statfs fops and thereby handle of statfs Change-Id: I037ee846aee3971826a73c592e15f1be27796c64 BUG: 1176837 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10356 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* build: gluster-api-devel requires libacl-develKaleb S. KEITHLEY2015-05-051-0/+1
| | | | | | | | | | | building libvirt fails Change-Id: I355879e6c6f56e49c4ee78e1a9763598acacd074 BUG: 1218625 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/10579 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cleanup: Unmount all fuse mountsPranith Kumar K2015-05-051-0/+3
| | | | | | | | | Change-Id: I851b4bd74b595d739f7bf9eea920d07e31fa5fbc BUG: 1202244 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10540 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* guster/dht: tiered volumes may not allow access to files undergoing migrationDan Lambright2015-05-051-0/+22
| | | | | | | | | | | | | | | | | | | | If a read IO occurs against a file that has reached rebalance phase 2, we redirect the IO to the destination. For tiered volumes, when we try to reopen the file (on the destination), the lower level DHT receives the open call and fails; it does not have a "cached subvol". Fix is to "teach" the lower level DHT of the new location by sending a locate before the open. Change-Id: Ia4acb0035ff1da15f6a8f9ed54f43c76e8b98f5f BUG: 1214048 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Signed-off-by: root <root@gprfs018.sbu.lab.eng.bos.redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10324 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* geo-rep: rename handling in dht volumeNithya Balachandran2015-05-053-0/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Background: Glusterfs changelogs are stored in each brick, which records the changes happened in that brick. Georep will run in all the nodes of master and processes changelogs "independently". Processing changelogs is in brick level, but all the fops will be replayed on "slave mount" point. Problem: With a DHT volume, in changelog "internal fops" are NOT recorded. For Rename case, Rename is recorded in "hashed" brick changelog. (DHT's internal fops like creating linkto file, unlink is NOT recorded). This lead us to inconsistent rename operations. For example, Distribute volume created with Two bricks B1, B2. //Consider master volume mounted @ /mnt/master and following operations executed: cd /mnt/master touch f1 // f1 falls on B1 Hash mv f1 f2 // f2 falls on B2 Hash // Here, Changelogs are recorded as below: @B1 CREATE f1 @B2 RENAME f1 f2 Here, race exists between Brick B1 and B2, say B2 will get executed first. Source file f1 itself is "NOT PRESENT", so it will go ahead and create f2 (Current implementation). We have this problem When rename falls in another brick and file is unlinked in Master. Similar kind of issue exists in following case too(multiple rename): CREATE f1 RENAME f1 f2 RENAME f2 f1 Solution: Instead of carrying out "changelogging" at "HASHED volume", carry out at the "CACHED volume". This way we have rename operations carried out where actual files are present. So,Changelog recorded as : @B1 CREATE f1 RENAME f1 f2 credit: sarumuga@redhat.com PS: Some of the races as the one below are _NOT_ fixed by this patch * f1 and f2 exist. B1 and B2 are their respective cached subvols. For both files hashed-subvol == cached-subvol * mv f1 f2 on master. * B1 has change-log entry of rename f1 f2 * rebalance migrates f2 from B1 and B2 * mv f2 f1 on master. * B2 has change-log entry of rename f2 f1 Since changelog entries (rename f1 f2) and (rename f2 f1) are processed independently by gsyncds, which of either f1 and f2 survives on slave is subject to race. Note that on master its file f1 with name f1 which survived. On slave it can be either file f1 with name f1 or file f2 with name f2 based on who wins the race of processing changelog. Change-Id: Iebc222f582613924c3a7cba37fb6d3e2d8332eda BUG: 1141379 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10410 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* dht/tier/rebalancer: Fix reset of tiering client pidJoseph Fernandes2015-05-054-3/+5
| | | | | | | | | | | | | | | | In the patch http://review.gluster.org/#/c/9657 the client pid set by tiering migration was getting over- written in dht_start_rebalance_task(). Just corrected it in dht_setxattr() before calling dht_start_rebalance_task() and removed it from dht_start_rebalance_task(). Change-Id: I37cfa111f83a4e5d498042575c93799f60b49870 BUG: 1217937 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10502 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* tiering: Send both attach-tier and tier-start togetherMohammed Rafi KC2015-05-054-12/+122
| | | | | | | | | | | | After attaching tier, we have to start tier rebalance process. This patch is to trigger tier start along with attch-tier. Change-Id: I39380f95123f0087a82213ef263f9f33adcc5adc BUG: 1214222 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10363 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* glusterd/tiering: disable tiering if cluster runs older gluster versionsDan Lambright2015-05-051-1/+43
| | | | | | | | | | | | Do not allow attach or detach tier to work if any of the nodes is running gluster code < 3.7. Change-Id: Id9af8f4057f6fad9cb703ec7645bc01eccb11fc1 BUG: 1218287 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10531 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* cluster/tier: don't use hot tier until subvolumes readyDan Lambright2015-05-052-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | When we attach a tier, the hot tier becomes the hashed subvolume. But directories may not yet have been replicated by the fix layout process. Hence lookups to those directories will fail on the hot subvolume. We should only go to the hashed subvolume once the layout has been fixed. This is known if the layout for the parent directory does not have an error. If there is an error, the cold tier is considered the hashed subvolume. The exception to this rules is ENOCON, in which case we do not know where the file is and must abort. Note we may revalidate a lookup for a directory even if the inode has not yet been populated by FUSE. This case can happen in tiering (where one tier has completed a lookup but the other has not, in which case we revalidate one tier when we call lookup the second time). Such inodes are still invalid and should not be consulted for validation. Change-Id: Ia2bc62e1d807bd70590bd2a8300496264d73c523 BUG: 1214289 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10435 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com>
* Tests: wworkaround NetBSD failures in cdc.tEmmanuel Dreyfus2015-05-051-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | The volume reset network.compression operation cause brick processes to be restarted. If the volume is already started, a brick process is already there and the restart will fail, as the brick TCP port is already in use. Because the new brick process is not started, the volume is left with no brick online, and the volume stop operation will timeout waiting for bricks to stop. Obviosuly we have two bugs here - If volume reset network.compression needs to restart the bricks, it should first make sure the previous brick process is terminated - volume stop should not wait forever for bricks to come back online This change does not fix the bugs but just makes sure the volume is stoped before volume reset network.compression, so that the failure oes not happen. BUG: 1129939 Change-Id: I9cd5cdc767ef6ee9dd31f2121d672dc3bfdce45f Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10553 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Upcall: Send stat as part of cache_invalidation notificationsSoumya Koduri2015-05-0514-198/+317
| | | | | | | | | | | | | | | | Have added support to send attributes of both entries and its parent (include oldparent in case of RENAME fop) in the same notification request to avoid multiple rpc requests. Also, made changes in gfapi to send parent object and its attributes changed in a single upcall event. Change-Id: I92833da3bcec38d65216921c2ce4d10367c32ef1 BUG: 1200262 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/10460 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* geo-rep: Minimize rm -rf race in Geo-repAravinda VK2015-05-053-26/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While doing RMDIR worker gets ENOTEMPTY because same directory will have files from other bricks which are not deleted since that worker is slow processing. So geo-rep does recursive_delete. Recursive delete was done using shutil.rmtree. once started, it will not check disk_gfid in between. So it ends up deleting the new files created by other workers. Also if other worker creates files after one worker gets list of files to be deleted, then first worker will again get ENOTEMPTY again. To fix these races, retry is added when it gets ENOTEMPTY/ESTALE/ENODATA. And disk_gfid check added for original path for which recursive_delete is called. This disk gfid check executed before every Unlink/Rmdir. If disk gfid is not matching with GFID from Changelog, that means other worker deleted the directory. Even if the subdir/file present, it belongs to different parent. Exit without performing further deletes. Retry on ENOENT during create is ignored, since if CREATE/MKNOD/MKDIR failed with ENOENT will not succeed unless parent directory is created again. Rsync errors handling was handling unlinked_gfids_list only for one Changelog, but when processed in batch it fails to detect unlinked_gfids and retries again. Finally skips the entire Changelogs in that batch. Fixed this issue by moving self.unlinked_gfids reset logic before batch start and after batch end. Most of the Geo-rep races with rm -rf is eliminated with this patch, but in some cases stale directories left in some bricks and in mount point we get ENOTEMPTY.(DHT issue, Error will be logged in Slave log) BUG: 1211037 Change-Id: I8716b88e4c741545f526095bf789f7c1e28008cb Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10204 Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* rdma:properly handle iobuf_pool when rdma transport is unloadedMohammed Rafi KC2015-05-052-20/+62
| | | | | | | | | | | | | | | | | We are registering iobuf_pool with rdma. When rdma transport is unloaded, we need to deregister all the buffers registered with rdma. Otherwise iobuf_arena destroy will fail. Also if rdma.so is loaded again, then register iobuf_pool with rdma Change-Id: Ic197721a44ba11dce41e03058e0a73901248c541 BUG: 1200704 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/9854 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* build: Autogenerated files delivered in dist tarballKaleb S. KEITHLEY2015-05-051-0/+2
| | | | | | | | Change-Id: I99028fe6ff4547f837b0997543234a9020b75607 BUG: 1216067 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/10429 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Coverity fixGaurav Kumar Garg2015-05-052-4/+21
| | | | | | | | | | | | | | | | | | | CID: 1293504 (Calling xlator_set_option without checking return value ) CID: 1293502 (Dereferencing a pointer that might be null xl when calling xlator_set_option) CID: 1293500 (Assigning value from dict_get_int32(dict, "type", &type) to ret here, but that stored value is overwritten before it can be used.) Change-Id: I5314fb399480df70bd77bc374e3b573f2efd5710 BUG: 1093692 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10201 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaushal M <kaushal@redhat.com>