summaryrefslogtreecommitdiffstats
path: root/xlators/features
Commit message (Collapse)AuthorAgeFilesLines
* features/changelog: Avoid creation of empty changelogsSaravanakumar Arumugam2015-05-083-23/+154
| | | | | | | | | | | | | | | | | | An empty changelog when rolled over gets unlinked and indexed with a modified path-name in htime file. The modification is "changelog" not "CHANGELOG" in basename of the empty changelog file. Change-Id: I77fd0b48b5c33c245418f5ac7a9756f08ece24d9 BUG: 1208470 Signed-off-by: Ajeet Jha <ajha@redhat.com> Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/9572 Tested-by: NetBSD Build System Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: Scrubber pause/resumeVenky Shankar2015-05-083-9/+58
| | | | | | | | | | | | | | | | | | | | | | With logical scan/scrub split, pausing filesystem scrubber is an override to the thread throttling mechanism, which effectively throttles "down" number of scrubber threads to zero. This causes scanner to wait until threads are spawned again (when resumed) thereby continuing where it left off (since the file tree walk stack is effectively preserved when the main scanner thread is waiting for scrubbers to consume scanned entries). The only catch is when scrubber daemon restarts: file tree walk stack is lost and scrubbing initiates from root. This is probably OK for now (can be changed later to persist parent directory information before entering pause state). Change-Id: I5109a749b7fccd0f5367765078f46e6522dd32a1 BUG: 1208131 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10521 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: Throttle filesystem scrubberVenky Shankar2015-05-075-51/+712
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces multithreaded filesystem scrubber based on throttling option configured for a particular volume. The implementation "logically" breaks scanning and scrubbing with the number of scrubber threads auto-configured depending upon the throttle configuration. Scanning (crawling) is left single threaded (per brick) with entries scrubbed in bulk. On reaching this "bulk" watermark, scanner waits until entries are scrubbed. Bricks for a particular volume have a set of thread(s) assigned for scrubbing, with entries for each brick scrubbed in a round robin fashion to avoid scrub "stalls" when a brick (out of N bricks) is under active scrubbing. This mechanism helps us implement "pause/resume" with ease: all one need to do is to cleanup scrubber threads and let the main scanner thread "wait" untill scrubbing is resumed (where the scrubber thread(s) are spawned again), therefore continuing where we left off (unless we restart the deamons, where crawl initiates from root directory again, but I guess that's OK). [ NOTE: Throttling is optional for the signer daemon, without which it runs full throttle. However, passing "-DBR_RATE_LIMIT_SIGNER" predefined in CFLAGS enables CPU throttling (during checksum calculation) thereby avoiding high CPU usage. ] Subsequent patches would introduce CPU throttling during hash calculation for scrubber. Change-Id: I5701dd6cd4dff27ca3144ac5e3798a2216b39d4f BUG: 1207020 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10511 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/shard: Implement [f]truncate fopsKrutika Dhananjay2015-05-072-307/+802
| | | | | | | | | | | | | | To-Do: * Make ftruncate work even in the absence of path * Aggregate and update ia_blocks appropriately when a file is truncated to a lower size. Change-Id: Ifd24c2f5e80d2c3bc921261f5481251df8948126 BUG: 1207615 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10631 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glupy: fix tuntime search path and python module directory layoutEmmanuel Dreyfus2015-05-073-3/+11
| | | | | | | | | | | | | | | | | | | | 1) The glupy.so xlator should embed the runtime search path for the python libraries. Unfortunately, python-config does not gives the appprioate flags, therefore we need to also use pkg-config to obtain them 2) Fix the glupy python module directory layout so that python can import the module without problem That two fixes seems to let glupy.t pass on NetBSD again. BUG: 1129939 Change-Id: I397aa726ab8bf7d91fa0d6d870a30910a5f4a5d9 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10616 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* features/bit-rot: Token Bucket based throttlingVenky Shankar2015-05-076-9/+432
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BitRot daemons (signer & scrubber) are disk/cpu hoggers when left running full throttle. Checksum calculations (especially SHA family of hash routines) can be quite CPU intensive. Moreover periodic disk scans performed by scrubber followed by reading data blocks for hash calculation (which is also done by signer) generate lot of heavy IO request(s). This causes interference with actual client operations (be it a regular client or filesystems daemons such as self-heal, etc..) and results in degraded system performance. This patch introduces throttling based on Token Bucket Filtering[1]. It's a well known algorithm for checking (and ensuring) that data transmission conform to defined limits and generally used in packet switched networks. Linux control groups (Cgroups) uses a variant[2] of this algorithm to provide block device IO throttling (cgroup subsys "blkio": blk-iothrottle). So, why not just live with Cgroups? Cgroups is linux specific. We need to have a throttling mechanism for other supported UNIXes. Moreover, having our own implementation gives much more finer control in terms of tuning it for our needs (plus the simplicity of the alogorithm itself). Ideally, throttling should be a part of server stack (either as a separate translator or integrated with io-threads) since that's the point of entry for IO request(s) from *all* client(s). That way one could selectively throttle IO request(s) based on client PIDs (frame->root->pid), e.g., self-heal daemon, bitrot, etc.. (*actual* clients can run full throttle). This implementation avoids that deliberately (there needs to be a much more smarter queueing mechanism) and throttles CPU usage for hash calculations. This patch is just the infrastructure part with no interfaces exposed to set various throttling values. The tunable selected here (basically hardcoded) avoids 100% CPU usage during hash calculation (with some bursts cycles). We'd need much more intensive test(s) to assign values for various throttling options (lazy/normal/aggressive). [1] https://en.wikipedia.org/wiki/Token_bucket [2] http://en.wikipedia.org/wiki/Token_bucket#Hierarchical_token_bucket Change-Id: Icc49af80eeab6adb60166d0810e69ef37cfe2fd8 BUG: 1207020 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10307 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* features/changelog: Version support for Changelog ParserAravinda VK2015-05-062-11/+70
| | | | | | | | | | | | | | | | | | As optional feature, during unlink, full path will be recorded. Changelog Version number to be bumped up to 1.2. With this patch, parser checks the version number before parsing and handles accordingly. Change-Id: Ic1ad98259c39e417029a08e26a1d4b467817e65a BUG: 1214561 Signed-off-by: Aravinda VK <avishwan@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10166 Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* quota/marker: turn off inode quotas by defaultvmallika2015-05-063-8/+63
| | | | | | | | | | | | | | | | | | | | | inode quota is a new feature implemented in glusterfs-3.7 if quota is enabled in the older version and is upgraded to a new version, we can hit setxattr spike during self-heal of inode quotas. So, when a quota is enabled, turn off inode-quotas with a xlator option. With this patch, we still account for inode quotas but only when a write operation is performed for a particular file. User will be able to query inode quotas once the Inode-quota xlator option is enabled. Change-Id: I52fb28bf7024989ce7bb08ac63a303bf3ec1ec9a BUG: 1209430 Signed-off-by: vmallika <vmallika@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/10152 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ctr/xlator: Named lookup heal of pre-existing files, before ctr was ON.Joseph Fernandes2015-05-066-4/+945
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The CTR xlator records file meta (heat/hardlinks) into the data. This works fine for files which are created after ctr xlator is switched ON. But for files which were created before CTR xlator is ON, CTR xlator is not able to record either of the meta i.e heat or hardlinks. Thus making those files immune to promotions/demotions. Solution: The solution that is implemented in this patch is do ctr-db heal of all those pre-existent files, using named lookup. For this purpose we use the inode-xlator context variable option in gluster. The inode-xlator context variable for ctr xlator will have the following, a. A Lock for the context variable b. A hardlink list: This list represents the successful looked up hardlinks. These are the scenarios when the hardlink list is updated: 1) Named-Lookup: Whenever a named lookup happens on a file, in the wind path we copy all required hardlink and inode information to ctr_db_record structure, which resides in the frame->local variable. We dont update the database in wind. During the unwind, we read the information from the ctr_db_record and , Check if the inode context variable is created, if not we create it. Check if the hard link is there in the hardlink list. If its not there we add it to the list and send a update to the database using libgfdb. Please note: The database transaction can fail(and we ignore) as there already might be a record in the db. This update to the db is to heal if its not there. If its there in the list we ignore it. 2) Inode Forget: Whenever an inode forget hits we clear the hardlink list in the inode context variable and delete the inode context variable. Please note: An inode forget may happen for two reason, a. when the inode is delete. b. the in-memory inode is evicted from the inode table due to cache limits. 3) create: whenever a create happens we create the inode context variable and add the hardlink. The database updation is done as usual by ctr. 4) link: whenever a hardlink is created for the inode, we create the inode context variable, if not present, and add the hardlink to the list. 5) unlink: whenever a unlink happens we delete the hardlink from the list. 6) mknod: same as create. 7) rename: whenever a rename happens we update the hardlink in list. if the hardlink was not present for updation, we add the hardlink to the list. What is pending: 1) This solution will only work for named lookups. 2) We dont track afr-self-heal/dht-rebalancer traffic for healing. Change-Id: Ia4bbaf84128ad6ce8c3ddd70bcfa82894c79585f BUG: 1212037 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10370 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: Follow xattr naming conventionsVenky Shankar2015-05-063-3/+4
| | | | | | | | | | | | | | | | | Instead of "trusted.glusterfs.bit-rot.*" use "trusted.bit-rot.*" NOTE: With this patch, data on existing volumes would be resigned (which should be OK as of now since we do not expect many users as of now :-)) Change-Id: I926c7bca266a9c8f2cb35d57c4d0359aa5cecfa0 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10181 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/shard: Implement readv() fopKrutika Dhananjay2015-05-052-269/+635
| | | | | | | | | Change-Id: I4cc060710482de8633141170dd35f669f01f639b BUG: 1207615 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10528 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* Upcall: Send stat as part of cache_invalidation notificationsSoumya Koduri2015-05-055-137/+123
| | | | | | | | | | | | | | | | Have added support to send attributes of both entries and its parent (include oldparent in case of RENAME fop) in the same notification request to avoid multiple rpc requests. Also, made changes in gfapi to send parent object and its attributes changed in a single upcall event. Change-Id: I92833da3bcec38d65216921c2ce4d10367c32ef1 BUG: 1200262 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/10460 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* Upcall: Cleanup expired client entriesSoumya Koduri2015-05-044-18/+195
| | | | | | | | | | | | | | | | | | | | | To cleanup expired client entries (with access_time > 2*CACHE_INVALIDATION_TIMEOUT), have * defined a global list to contain all the upcall_inode_ctx allocated * Every time a upcall_inode_ctx is allocated, it is added to the global list * during inode_forget, that upcall_inode_ctx is marked for destroy * created a reaper thread which scans through that list * cleans up expired client entries * frees the inode_ctx with destroy_mode set. Note: This reaper thread is initialized only when features.cache_invalidation option is enabled. Change-Id: Iea2a63eb31b8e08d5709e7e090cf26fd13d01265 BUG: 1200267 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/10342 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* feature/changelog: Capture path for deletesKotresh HR2015-05-046-5/+188
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: There is no way to get the path of deleted file if we have gfid from changelog since the file is already deleted. SOLUTION: Do a recursive readlink on parent gfid in backend .glusterfs path to get the complete path in I/O callpath in changelog translator and capture it in callback. The path captured is relative from the brick root. The field separator used is '\0'. e.g., ......\0<pgfid>/bname\0<relative-path>\0<next-record> ADDITIONAL REQUIRED CHANGES: 1. The changelog translator option called "changelog.capture-del-path" is introduced to enable or disable the capturing of deleted entry path. e.g., gluster vol set <vol-name> changelog.capture-del-path on/off If capture-del-path is disabled, '\0' is captured instead of relative path. e.g., ......\0<pgfid>/bname\0\0\0<next-record> 2. The minor number in the version of changelog is bumped up from v1.1 to v1.2. 3. If recursive readlink is failed for some reason, it will capture \0 in place of <relative path>. e.g., ......\0<pgfid>/bname\0\0\0<next-record> (same as when caputre-del-path option is disabled) 4. If bname argument passed to "resolve_pargfid_to_path" function is NULL and pargfid is ROOT, "." is returned. This is not the case with changelog, where bname is always passed. This is applicable to other consumers of "resolve_pargfid_to_path" routine. NOTE: Changelog parser should consider the above new changes and should parse accordingly. Change-Id: I040ed429b5aa7d391033fc6a540edbf07fc37827 BUG: 1214561 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10288 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: NetBSD Build System
* rpc: Maintain separate xlator pointer in 'rpcsvc_state'Kotresh HR2015-05-043-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The structure 'rpcsvc_state', which maintains rpc server state had no separate pointer to track the translator. It was using the mydata pointer itself. So callers were forced to send xlator pointer as mydata which is opaque (void pointer) by function prototype. 'rpcsvc_register_init' is setting svc->mydata with xlator pointer. 'rpcsvc_register_notify' is overwriting svc->mydata with mydata pointer. And rpc interprets svc->mydata as xlator pointer internally. If someone passes non xlator structure pointer to rpcsvc_register_notify as libgfchangelog currently does, it might corrupt mydata. So interpreting opaque mydata as xlator pointer is incorrect as it is caller's choice to send mydata as any type of data to 'rpcsvc_register_notify'. Maintaining two different pointers in 'rpcsvc_state' for xlator and mydata solves the issue. Change-Id: I7874933fefc68f3fe01d44f92016a8e4e9768378 BUG: 1215161 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10366 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* glupy: package glupy as a subpackage under gluster namespaceHumble Devassy Chirammal2015-05-042-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently glupy files resides in gluster namespace of python site packages. The other projects like libgfapi-python ..etc are evolving and need to share the gluster namespace. The current structure makes things difficult as all subpackages have its own __init__ files and other files. One subpackage can not any more own gluster namespace. The attempt is to make below structure for gluster namespace so that it is more portable and scalable for future use. <sitepackages>/gluster/ | -- __init__.py | | -- glupy | -- __init__.py -- glupy.py -- ........ | | -- gfapi | -- __init__.py -- gfapi.py -- ........ By above structure clients can import: >>> from gluster import glupy >>> from gluster import gfapi libgfapi-python project has been moved to this structure via http://review.gluster.org/#/c/9668/ Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Change-Id: I54886200ddb6a4153a74d9e187aeca7cad79ef9e BUG: 1211900 Reviewed-on: http://review.gluster.org/10248 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* features/bitrot: Per volume bitrot translatorGaurav Kumar Garg2015-05-031-0/+20
| | | | | | | | | | | | | | | | | | | | Currently whatever bitrot/scrubber tunable value user set for one volume that value is considering for all other volumes also. Each volume should act on their respective bitrot/scrubber tunable value. For handling bitrot/scrubber tunable value independently with respect to all the volume bitrot and scrubber translator should run seperatly for each volume. Change-Id: I1d9379508afe6cfd2f78e3ebf29c829c362d84a9 BUG: 1170075 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10352 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* features/shard: Add "is-directory" checks in stat/fstatKrutika Dhananjay2015-05-031-0/+12
| | | | | | | | | | | | | | | | During mount, NFS directly calls stat on the root of the volume without sending a lookup on it. This was causing inode_ctx_get_block_size() to fail on /. A check is now added in [f]stat which would ensure no action is taken by shard xlator when the operation is on a directory. Change-Id: I81849eeddfdad9f271155442408d95b4a25d7647 BUG: 1207615 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10427 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Upcall: Handle missing fops in the upcall xlatorSoumya Koduri2015-05-034-130/+969
| | | | | | | | | | Change-Id: I968980dc4df458ec427e33503363bbd017e1163e BUG: 1200271 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/10194 Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* features/changelog: Consider only changelog on/off as changelog breakageKotresh HR2015-05-024-11/+211
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Earlier, both chagelog on/off and brick restart were considered to be changelog breakage and treated as changelog not being continuous. As a result, new HTIME.TSTAMP file was created on both the above cases. Now the change is made such that only on changelog enable/disable, the changelog is considered to be discontinuous. New HTIME.TSTAMP file is not created on brick restart, the changelogs files are appended to last HTIME.TSTAMP file. Treating changelog as continuous in above scenario is important as changelog history API will fail otherwise. It can successfully get changes between start and end timestamps only when changelog is continuous (Changelogs in single HTIME.TSTAMP file are treated as continuous). Without this change, changelog history API would fail, and it would become necessary to fallback to other mechanisms like xsync FSCrawl in case geo-rep to detect changes in this time window. But Xsync FSCrawl would not be applicable to other consumers like glusterfind. Rationale: 1. In plain distributed volume, if brick goes down, no I/O can happen onto the brick. Hence changelog is intact with data on disk. 2. In distributed replicate volume, if brick goes down, since self-heal traffic is captured in changelog. Eventually, I/O happened whend brick down is captured in changelog. Change-Id: I2eb66efe6ee9a9228fb1fcb38d6e7696b9559d5b BUG: 1211327 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10222 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System
* features/shard: Take hole size into account while computing ia_sizeKrutika Dhananjay2015-05-022-2/+10
| | | | | | | | | | Change-Id: I1a90ad6669c1cb79aaae6b4bd9621c75d9985c8a BUG: 1207615 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10446 Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* ctr/libgfdb: Performance enhancer for unlink/create/rename/link fopsJoseph Fernandes2015-05-013-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | 1) ctr_link_consistency option for ctr xaltor is provided so that the user can choose to switch it on or off. /* For link consistency we do a double update i.e mark the link * during the wind and during the unwind we update/delete the link. * This has a performance hit. We give a choice here whether we need * link consistency to be spoton or not using link_consistency flag. * This will have only one link update */ 2) In delete the wind time recording is moved to unwind path. /* Special performance case: * Updating wind time in unwind for delete. This is done here * as in the wind path we will not know whether its the last * link or not. For a last link there is not use to update any * wind or unwind time!*/ Change-Id: I209472fb816f939db4a868b97ba053b028f17ea6 BUG: 1217786 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10170 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* changelog: Fixing buffer overrun coverity issues.Nandaja Varma2015-04-302-7/+7
| | | | | | | | | | | | | | | | Coverity IDs: 1214630 1214631 1214633 1234643 Change-Id: I172c4f49bf651b2324522f9e661023f73ca05339 BUG: 789278 Signed-off-by: Nandaja Varma <nvarma@redhat.com> Reviewed-on: http://review.gluster.org/9557 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Sakshi Bansal Reviewed-by: Venky Shankar <vshankar@redhat.com>
* Upcall: Process each of the upcall events separatelySoumya Koduri2015-04-303-13/+9
| | | | | | | | | | | | | | | | As suggested during the code-review of Bug1200262, have modified GF_CBK_UPCALL to be exlusively GF_CBK_CACHE_INVALIDATION. Thus, for any new upcall event, a new CBK procedure will be added. Also made changes to store upcall data separately based on the upcall event type received. BUG: 1200262 Change-Id: I0f5e53d6f5ece16aecb514a0a426dca40fa1c755 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/10049 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* NFS-Ganesha: Handling CLI commands when NFS-Ganesha keys are setMeghana Madhusudhan2015-04-301-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ganesha.enable is set to on and features.ganesha is enabled, there are a few behaviour changes that should be seen in other volume operations. 1. ganesha.enable can be set to 'on' only when features.ganesha is set to 'enable' 2.When gluster vol is started, and if ganesha.enable key was set to 'on', it should automatically export the volume via NFS-Ganesha. 3.When ganesha.enable is set to 'on', and a volume is stopped, that volume should be unexported via NFS-Ganesha. 4. gluster vol reset <volname> If ganesha.enable was set to on, then unexport the volume via NFS-Ganesha. 5. gluster vol reset all If features.ganesha is set to enable, as part of reset all, set it to disable. This translates to teardown cluster. All the above problems are fixed by checking the global key and value, depending on the value, specific functions are called. And also, functions related to global commands are moved to cli-cmd-global.c Commit phase of features.ganesha enable/disable runs the ganesha-ha.sh setup/teardown respectively. Before the script begins, it is important that the NFS-Ganesha service starts on all the HA nodes. Having the start service commands in the commit phase could lead to problems. Moving the pre-requisite service start commands to the 'stage' phase. Change-Id: I5a256f94f8e1310ddcd5369f329b7168b2a24c47 BUG: 1200265 Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com> Reviewed-on: http://review.gluster.org/10283 Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* features/bitrot: Use global timer wheelVenky Shankar2015-04-301-23/+8
| | | | | | | | | | | Change-Id: I761927ea263b4144b851881f25791fda5b794f59 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10381 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* bitrot: Scrubber log should report 'bad' file detection as ALERT in logGaurav Kumar Garg2015-04-301-2/+2
| | | | | | | | | | | | If scrubber detect any bad object by mismatching of checksum of scrubber and signer then log messages shold come as a Alert instead of warning. Change-Id: I075d80700cbe6182e525a04419a80ab18419ff91 BUG: 1210687 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10226 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* quota: Validate NULL inode from the entries received in readdirp_cbkvmallika2015-04-302-6/+13
| | | | | | | | | | | | | | | | | | | In quota readdirp_cbk, inode ctx filled for the all entries received. In marker readdirp_cbk, files/directories are inspected for dirty There is no guarantee that entry->inode is populated. If entry->inode is NULL, this needs to be treated as readdir Change-Id: Id2d17bb89e4770845ce1f13d73abc2b3c5826c06 BUG: 1215550 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/10416 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/shard: Implement rename() fopKrutika Dhananjay2015-04-292-44/+261
| | | | | | | | | | | Change-Id: Ia39fa0f29eac84c18d13a94f704b1703565b504e BUG: 1207615 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10373 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* bitrot/scrub: fix induced throttling in syncop_ftw_throttle()Venky Shankar2015-04-261-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | Failing to reset scanning counter causes "incorrect" delay of around 50 seconds per directory entry. This causes scrubber to run extremely slowly. [ NOTE: This is a temporary fix. With the introduction of token bucket based throttling, inducing throttle via sleep() call would be unneeded. ] Also, fix logging messages in scrubber to log brick and full path of the object which is identified/marked as corrupted. Change-Id: Id501bd15dcdbd8a09613f80f9d84050304740027 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10375 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
* features/shard: Implement unlink fopKrutika Dhananjay2015-04-252-233/+495
| | | | | | | | | | | Change-Id: I5cdd805821a4f3657f490223b97f42c724ee588f BUG: 1207615 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10249 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/trash : Notify CTR translator if an unlink happens to a fileJiffin Tony Thottan2015-04-243-13/+76
| | | | | | | | | | | | | | | | | | | | | This implementation is same as the posix_unlink_cbk() where CTR sends a request during a unlink to send the number of links to the inode and posix obliges sending it using the unwind xdata dict. For Trash xlator a unlink is stat + mkdir(if parent is not present) + rename. And hence this is handled in trash_unlink_rename_cbk(). Change-Id: I402e83567b88e3c9fe171379693c82937af567f9 BUG: 1205545 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Anoop C S <achiraya@redhat.com> Reviewed-on: http://review.gluster.org/9989 Tested-by: NetBSD Build System Tested-by: Joseph Fernandes Reviewed-by: Joseph Fernandes Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/trash: Minor coverity fixesAnoop C S2015-04-231-4/+23
| | | | | | | | | | | | | | | | | | CID 1288784 CID 1288785 CID 1288795 CID 1288796 CID 1288797 CID 1288802 Change-Id: I51dd7653a2dce3b7b6387e5d91c1c07eb157a04b BUG: 789278 Signed-off-by: Anoop C S <achiraya@redhat.com> Reviewed-on: http://review.gluster.org/10315 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Tested-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
* features/shard: Consume size and block count in metadata read opsKrutika Dhananjay2015-04-212-98/+271
| | | | | | | | | | | | | | | | Metadata read fops like lookup, stat etc will now fetch the xattr that holds the size and block count information, extract the size and block count fields and set them in respective stbuf before unwinding the resultant iatt to the parent xlator. Change-Id: I881be8955092fa6b75f8b0e4f3deb01344cb638e BUG: 1207603 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10098 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* upcall: fix use-after free (CID 1288760, 1288761)Michael Adam2015-04-171-4/+4
| | | | | | | | | | | | | | | Coverity IDs: 1288760 - Read from pointer after free 1288761 - Use after free. Change-Id: Ide9405b9c30a3e27941054a4ae61f585ef09cd8c BUG: 789278 Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-on: http://review.gluster.org/10242 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* quota/marker: fix inode quota healing after glusterfs upgradevmallika2015-04-161-3/+7
| | | | | | | | | | | | | | | | | | | | There is a problem during upgrade where, inode quotas are not healed in the contri xattrs. Healing happens if contri xattrs are missing. But healing doesn't happen if contri xattrs are present and inode quota values are missing in the contri xattrs. This patch fixes the problem Change-Id: I6c88b74b5bb333a97c5419e24cc4ada82839f474 BUG: 1211808 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/10239 Tested-by: NetBSD Build System Reviewed-by: Sachin Pandit <spandit@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* NFS-Ganesha : Fixing HA script invocation and othersMeghana Madhusudhan2015-04-131-0/+16
| | | | | | | | | | | | | | gluster features.ganesha disable failed invariably. And also, there were problems in unexporting volumes dynamically.Fixed the above problems. Change-Id: I29aa289dc8dc7b39fe0fd9d3098a02097ca8ca0c BUG: 1207629 Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com> Reviewed-on: http://review.gluster.org/10199 Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: NetBSD Build System
* features/bit-rot: Mark versioning fsetxattr as internal fopKotresh HR2015-04-112-5/+10
| | | | | | | | | | | | | | | | | Changelog xlator was capturing bitrot-stub's fsetxattr sent for versioning. Since it was using the same frame as of the create fop, there was inconsistency in fop number and gfid of capturing metadata. So fix is to mark fsetxattr used for versioning as internal and add internal fop filter in changelog_fsetxattr. Change-Id: I51ff468995139838b22bf293a59a0713a92ee7a5 BUG: 1170075 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10148 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* build: make contrib/uuid dependency optionalNiels de Vos2015-04-104-13/+5
| | | | | | | | | | | | | | | | | | | On Linux systems we should use the libuuid from the distribution and not bundle and statically link the contrib/uuid/ bits. libglusterfs/src/compat-uuid.h has been introduced and should become an abstraction layer for different UUID APIs. Non-Linux operating systems should implement their compatibility layer there. Once all operating systems have an implementation in compat-uuid.h, we can remove contrib/uuid/ from the repository completely. Change-Id: I345e5357644be2521685e00358bb8c83c4ea0577 BUG: 1206587 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10129 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/quota: Fixing Logically dead codearao2015-04-101-5/+0
| | | | | | | | | | | | | CID: 1134007 The code never reaches the condition check on retlen in ret label, hence removing the dead code. Change-Id: Ia0108b69489bb78a2561ff8da6e00685f472ae82 BUG: 789278 Signed-off-by: arao <arao@redhat.com> Reviewed-on: http://review.gluster.org/9644 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot-stub: Packed format for version xattrVenky Shankar2015-04-091-1/+1
| | | | | | | | | | | | | Using __attribute__ ((__packed__)) for object signature xattr saves some bytes (7 bytes to be particular) occupied by the extended attribute on-disk as compared to the unpacked format. Change-Id: I91a6a0a54aa60e6fd8c357d72f7601b6ed213f2d BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10161 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* snapshot: correct error message in snapshot helperHumble Devassy Chirammal2015-04-081-1/+1
| | | | | | | | | | | Change-Id: Ic01a5d4115383f1245bae3fba2bf92e23c8213ff BUG: 1194640 Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/9747 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* features/shard: Introduce file size xattrKrutika Dhananjay2015-04-083-43/+268
| | | | | | | | | | | | | | | With each inode write FOP, the size and block count of the file will be updated within the xattr. There are two 64 byte fields that are intentionally left blank for now for future use when consistency guarantee is introduced later in sharding. Change-Id: I40a2e700150c1f199a6bf87909f063c84ab7bb43 BUG: 1207603 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10097 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* libglusterfs/syncop: Add xdata to all syncop callsRaghavendra Talur2015-04-087-49/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for xdata in both the request and response path of syncops. Few calls like lookup already had the support; have renamed variables in few places to maintain uniformity. xdata passed downwards is known as xdata_in and xdata passed upwards is known as xdata_out. There is an old patch by Jeff Darcy at http://review.gluster.org/#/c/8769/3 which does the same for some selected calls. It also brings in xdata support at gfapi level. xdata support at gfapi level would be introduced in subsequent patches. Change-Id: I340e94ebaf2a38e160e65bc30732e8fe1c532dcc BUG: 1158621 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/9859 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot-stub: header file update in noinst_HEADERSVenky Shankar2015-04-081-1/+2
| | | | | | | | | | | Missing "bit-rot-object-version.h" causing devrpm failures. Change-Id: I5af326c5871cc468a10dece4772b29eda06c4fa9 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10160 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* ctr : Fix for heating of files during promotion/demotionJoseph Fernandes2015-04-083-44/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix will solve the heating of the files during the promotion or demotion. Promotion: ~~~~~~~~~ When a file gets promoted it get the current time stamp during creation only, but following writes or reads during the migration wont heat the file. Demotion: ~~~~~~~~ When a file gets demoted it get the wind/unwind time stamp is set to zero. The following writes or reads during the migration wont heat the file. What is remaining ? ~~~~~~~~~~~~~~~~~ Bug 1209129 ( https://bugzilla.redhat.com/show_bug.cgi?id=1209129 ) Inspite of this fix there is still a issue remaining, i.e the heat of the file is not keep intact during a internal rebalance activity i.e a rebalance within a tier. Change-Id: I01e82dc226355599732d40e699062cee7960b0a5 BUG: 1207867 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10080 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/changelog: fix possible illegal memory access (CID 1288819)Michael Adam2015-04-081-1/+2
| | | | | | | | | | | | | | | | Coverity CID 1288819 strncpy executed with a limit equal to the target array size potentially leaves the target string not null terminated. Make sure the copied string is a valid 0 terminated string. Change-Id: Ie2d2970f37840146aa18724be3b89e93194c8160 BUG: 789278 Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-on: http://review.gluster.org/10062 Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* bitrot/scrub: Scrubber fixesVenky Shankar2015-04-085-82/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a handful of problem with scrubber which are detailed below. Scrubber used to skip objects for verification due to missing fd iterface to fetch versioning extended attributes. Similar to the inode interface, an fd based interface in POSIX is now introduced. Moreover, this patch also fixes potential false reporting by scrubber due to: An object gets dirtied and signed when scrubber is busy calculatingobject checksum. This is fixed by caching the signed version when an object is first inspected for stalenes, i.e., during pre-compute stage. This version is used to verify checksum in the post-compute stage when the signatures are compared for possible corruption. Side effect of _not_ sending signature length during signing resulted in "truncated" signature to be set for an object. Now, at the time of signing, the signature length is sent and is used in place of invoking strlen() to get signature length (which could have possible 00s). The signature length itself is not persisted in the signature xattr, but is calculated on-the-fly by substracting the xattr length by the "structure" header size. Some of the log entries are made more meaningful (as and aid for debugging). Change-Id: I938bee5aea6688d5d99eb2640053613af86d6269 BUG: 1207624 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10118 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tests/bitrot-stub: Object versioning test(s)Venky Shankar2015-04-082-16/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces basic object versioning test(s) which is required for bitrot detection to work correctly. Basic test(s) such as opening a file in read-only mode, single open, multiple open()s are covered on FUSE mount _only_ as stub does not support anonymous fds yet. For this reason, the test case disables open-behind. Actual verification is implemented as a C source which makes use of the same on-disk data structures as used by the stub code. The data structures are moved to separate header file which is included by the test script. Such modularization helps in future enhancements to keep the version "data type" opaque and provide handful of APIs version checking (equal/greater/etc..). [ This is just a start and should grow over time as stub is enhanced and codebase matures. ] Change-Id: Ibee20e65a15b56bbdd59fd2703f9305b115aec7a BUG: 1201724 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10140 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot-stub: Enhancement to versioning protocolVenky Shankar2015-04-082-57/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | .. and potential bug fixes / memleak. While assigning initial version to an object, both extended attributes (namely, ongoing version and the default signing version) were persisted. This is optimized to just persist the ongoing version along with safe handling of xattr request(s) in it's absence. This is better than the earlier approach as the two xattr sets were not atomic anyway (allowing a request to sneak in between between two set operations). This also allows to perform sanity checks on objects during lookup()/getxattr(): objects with missing ongoing version but presence of signature are possible candidates of tampering (and catching implementation bugs). There were couple of instances in the code where versioning xattrs were incorrectly removed before in-memory versions were initialized, which have been fixed with this patch. A memory leak in the IPC code path is also fixed. Change-Id: I01c690ccfe7156a883582275f40f79a7c10c0900 BUG: 1207054 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10117 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>