summaryrefslogtreecommitdiffstats
path: root/xlators/features
Commit message (Collapse)AuthorAgeFilesLines
...
| * features/quota: remove in-memory accounting of files in enforcerRaghavendra G2014-01-252-221/+210
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Accounting was done in enforcer (though marker is the ultimate source of truth) to offset cached directory size becoming stale. However, with enforcer being moved to brick we can no longer maintain correct cluster wide size for a directory. Hence removing accounting code from enforcer. Change-Id: I5ea94234da4da85ed5f5ced1354d8de3454b3fcb BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6434 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
| * features/quota: use STACK_WIND_TAIL when quota is turned off.Raghavendra G2014-01-241-168/+207
| | | | | | | | | | | | | | | | | | Change-Id: I8a0b7f3a1995a72560c210efbad1eaafb0bdf329 BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6373 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
| * gfid-access: fix the issue of entry creation with wrong gfidAmar Tumballi2014-01-231-6/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * dict_set was happening with a string instead of 'uuid_t' causing entry creations to happen with wrong gfid * revalidate was causing excessive ESTALE logs as lookup was happening on '.gfid/' path itself causing server to become clueless Change-Id: I3b76ce7fdec9c2ff785be21f506f859f489f80f0 BUG: 1054199 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/6520 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
| * quota: filter glusterfs quota xattrsSusant Palai2014-01-221-34/+22
| | | | | | | | | | | | | | | | | | Change-Id: I86ebe02735ee88598640240aa888e02b48ecc06c BUG: 1040423 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/6490 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
| * locks: set @lock->frame = NULL when lock is grantedAnand Avati2014-01-222-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | This way disconnect cleanup code can differentiate which locks are granted vs blocked. Change-Id: I2a835c6865b6c804231d852953ea84eeccef35a3 BUG: 849630 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6730 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* | Merge branch 'upstream'Jeff Darcy2014-01-217-43/+243
|\| | | | | | | | | | | | | | | Conflicts: api/src/glfs-fops.c api/src/glfs-handleops.c Change-Id: I6811674cc4ec4be6fa6e4cdebb4bc428194bebd8
| * quota: get directory size before enforcing quota on renameKrishnan Parthasarathi2014-01-202-12/+72
| | | | | | | | | | | | | | | | | | | | Change-Id: If18cab5992ddc91457782786942971deb1b51ead BUG: 1023974 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6155 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
| * syncop: Change return value of syncopPranith Kumar K2014-01-193-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: We found a day-1 bug when syncop_xxx() infra is used inside a synctask with compilation optimization (CFLAGS -O2). Detailed explanation of the Root cause: We found the bug in 'gf_defrag_migrate_data' in rebalance operation: Lets look at interesting parts of the function: int gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t *loc, dict_t *migrate_data) { ..... code section - [ Loop ] while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL, &entries)) != 0) { ..... code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno by a thread) /* Need to keep track of ENOENT errno, that means, there is no need to send more readdirp() */ readdir_operrno = errno; ..... code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread) ret = syncop_getxattr (this, &entry_loc, &dict, GF_XATTR_LINKINFO_KEY); code section - [ ERRNO-2] (checking for failures of syncop_getxattr(). This may not always be executed in same thread which executed [SYNCOP-1]) if (ret < 0) { if (errno != ENODATA) { loglevel = GF_LOG_ERROR; defrag->total_failures += 1; ..... } the function above could be executed by thread(t1) till [SYNCOP-1] and code from [ERRNO-2] can be executed by a different thread(t2) because of the way syncop-infra schedules the tasks. when the code is compiled with -O2 optimization this is the assembly code that is generated: [ERRNO-1] 1165 readdir_operrno = errno; <<---- errno gets expanded as *(__errno_location()) 0x00007fd149d48b60 <+496>: callq 0x7fd149d410c0 <address@hidden> 0x00007fd149d48b72 <+514>: mov %rax,0x50(%rsp) <<------ Address returned by __errno_location() is stored in a special location in stack for later use. 0x00007fd149d48b77 <+519>: mov (%rax),%eax 0x00007fd149d48b79 <+521>: mov %eax,0x78(%rsp) .... [ERRNO-2] 1281 if (errno != ENODATA) { 0x00007fd149d492ae <+2366>: mov 0x50(%rsp),%rax <<----- Because it already stored the address returned by __errno_location(), it just dereferences the address to get the errno value. BUT THIS CODE NEED NOT BE EXECUTED BY SAME THREAD!!! 0x00007fd149d492b3 <+2371>: mov $0x9,%ebp 0x00007fd149d492b8 <+2376>: mov (%rax),%edi 0x00007fd149d492ba <+2378>: cmp $0x3d,%edi The problem is that __errno_location() value of t1 and t2 are different. So [ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though t2 is executing [ERRNO-2] code section. When code is compiled without any optimization for [ERRNO-2]: 1281 if (errno != ENODATA) { 0x00007fd58e7a326f <+2237>: callq 0x7fd58e797300 <address@hidden><<--- As it is calling __errno_location() again it gets the location from t2 so it works as intended. 0x00007fd58e7a3274 <+2242>: mov (%rax),%eax 0x00007fd58e7a3276 <+2244>: cmp $0x3d,%eax 0x00007fd58e7a3279 <+2247>: je 0x7fd58e7a32a1 <gf_defrag_migrate_data+2287> Fix: Make syncop_xxx() return (-errno) value as the return value in case of errors and all the functions which make syncop_xxx() will need to use (-ret) to figure out the reason for failure in case of syncop_xxx() failures. Change-Id: I314d20dabe55d3e62ff66f3b4adb1cac2eaebb57 BUG: 1040356 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6475 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
| * build: Start using library versioning for various librariesHarshavardhana2014-01-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to libtool three individual numbers stand for CURRENT:REVISION:AGE, or C:R:A for short. The libtool script typically tacks these three numbers onto the end of the name of the .so file it creates. The formula for calculating the file numbers on Linux and Solaris is /path/to/library/<library_name>.(C - A).(A).(R) As you release new versions of your library, you will update the library's C:R:A. Although the rules for changing these version numbers can quickly become confusing, a few simple tips should help keep you on track. The libtool documentation goes into greater depth. In essence, every time you make a change to the library and release it, the C:R:A should change. A new library should start with 0:0:0. Each time you change the public interface (i.e., your installed header files), you should increment the CURRENT number. This is called your interface number. The main use of this interface number is to tag successive revisions of your API. The AGE number is how many consecutive versions of the API the current implementation supports. Thus if the CURRENT library API is the sixth published version of the interface and it is also binary compatible with the fourth and fifth versions (i.e., the last two), the C:R:A might be 6:0:2. When you break binary compatibility, you need to set AGE to 0 and of course increment CURRENT. The REVISION marks a change in the source code of the library that doesn't affect the interface-for example, a minor bug fix. Anytime you increment CURRENT, you should set REVISION back to 0. Change-Id: Id72e74c1642c804fea6f93ec109135c7c16f1810 BUG: 862082 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/5645 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
| * features/quota: Handle the corner case in statfs callVarun Shastry2014-01-171-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Even though limit is not set quota used to send 'quota-deem-statfs' key in the dictionary resulting in incorrect calculations. Change-Id: I643cb35cca6648e40f1c745135280876958ddaa9 BUG: 1046894 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/6607 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
| * quota: fix recording of last alert log messageKrishnan Parthasarathi2014-01-161-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: Alert log messages, corresponding to disk usage crossing soft-limit on a directory, weren't being logged as often as expected. CAUSE: The mechanism in place to log alert messages, once every alert-time seconds, set the previous logged time incorrectly. FIX: Update previous logged time only if we logged an alert message, ie. when the "time was right" to alert. Change-Id: Iafcb7be11e3240e2d04b0bb29ddb88e4f4a1bd25 BUG: 969461 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6532 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
| * features/quota: Metadata cleanupVarun Shastry2014-01-161-4/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quota and marker uses 'trusted.glusterfs.quota*' and 'trusted.pgfid*' xattrs to store its configurations and accounting information and also to build the parent inode chain in case of absense of path. Problem: After disabling and then enabling quota back, the xattrs may contain stale data leading to impaired accounting and thus improper enforcement. Solution: Clean up all the quota related xattrs after quota disable. Marker xlator implements a virtual xattr to cleanup quota and pgfid xattrs. In this approach glusterd mounts an auxiliary mount and sends the below command to all the files by crawling the mountpoint. #setfattr -n "glusterfs.quota-xattr-cleanup" -v 1 <path/to/file> Credit: Krishnan Parthasarathi <kparthas@redhat.com> Varun Shastry <vshastry@redhat.com> Change-Id: I9380eca58a285dc27dd572de1767aac8f2cd8049 BUG: 969461 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/6369 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* | features/changelog: couple of performance improvementsVenky Shankar2014-01-166-10/+34
| | | | | | | | | | | | | | | | | | Changes being: * Changes journal preallocation size to 512 MB * Lock-less updation of journal Change-Id: I7c2b7ee224b3d338e4fdd35cc078ea2a13251292 Signed-off-by: Venky Shankar <vshankar@redhat.com>
* | Restore NSR-specific changelog patches.Jeff Darcy2014-01-1413-664/+585
| | | | | | | | | | Change-Id: I40fb3a024086dddbdaba3d9014eeda92dc432f48 Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
* | Merge branch 'upstream' into mergeJeff Darcy2014-01-149-635/+400
|\|
| * consolidate code for #ifdef HAVE_LINKAT usageVijay Bellur2014-01-141-25/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sys_link() now does ifdef HAVE_LINKAT linkat (...) else link (...) endif Use sys_link() in all places where we previously had the conditional behavior. Change-Id: I8bce5ac1175efd2ba7ab4bb5b372f6d1e0365d28 BUG: 764655 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6633 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Anand Avati <avati@redhat.com>
| * locks: various fixesAnand Avati2014-01-137-626/+383
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - implement ref/unref of entry locks (and fix bad pointer deref crashes) - code cleanup and deleted various data types - fix improper read/write lock conflict detection in entrylk - fix indefinite hang of blocked locks on disconnect - register locks in client_t synchronously, fix crashes in disconnect path Change-Id: Id273690c9111b8052139d1847060d1fb5a711924 BUG: 849630 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
| * Use linkat() instead of link() for portability sakeEmmanuel Dreyfus2014-01-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POSIX does not says wether link(2) on symlink should link on symlink itself or on target. Linux use symlink, most other systems use target. Using linkat(2) allows the behavior to be specified, so that the behavior is portable. Also fix configure test for NetBSD linkat(2), which ceased to work. BUG: 764655 Change-Id: I2633fde3b0828ca8c199e11c827720c513e15852 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6613 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
| * Use linkat() instead of link() for portability sakeEmmanuel Dreyfus2014-01-021-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POSIX does not says wether link(2) on symlink should link on symlink itself or on target. Linux use symlink, most other systems use target. Using linkat(2) allows the behavior to be specified, so that the behavior is portable. Also fix configure test for NetBSD linkata(2), which ceased to work. BUG: 764655 Change-Id: Iccd27ac076b7a74e40dcbaa1c4762fd3ad59da5f Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6539 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
| * features/index: Minor improvement in log message.Vijay Bellur2013-12-271-1/+1
| | | | | | | | | | | | | | | | | | Change-Id: Ic4f39785dab5ad64def4c06d7bd2f2dec09e19ab BUG: 1045690 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6606 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
| * features/quota: log usage only if hard limit not exceeded.Raghavendra G2013-12-151-4/+9
| | | | | | | | | | | | | | | | | | Change-Id: I60abf576999996e0d0d65534e1e416f6e10994c8 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> BUG: 969461 Reviewed-on: http://review.gluster.org/6479 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
| * features/changelog: more changelog fixes.Ajeet Jha2013-12-128-38/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | -> log additional records. -> include FOP number for metadata. -> prevent crash if inode is not found in a fop. Change-Id: I9edd4b71819ebd68c6a2b4150ae279c471d129da BUG: 1036536 Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/6403 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@gmail.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* | Temporarily revert NSR-specific changelog patches.Jeff Darcy2014-01-1413-585/+664
| | | | | | | | | | | | | | | | | | This is necessary so that patches from upstream can merge cleanly. Otherwise there are some nasty conflicts, and resolving them by hand gets even uglier than this approach. Change-Id: I4235f8ba0ad63563c2e7dec1a1e8eeb636657574 Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
* | Roll-up patch for NSR so far.Jeff Darcy2013-12-1116-468/+2550
|/ | | | | | | | Previous history: https://forge.gluster.org/~jdarcy/glusterfs-core/glusterfs-nsr Change-Id: I2b56328788753c6a74d9589815f2dd705ac9ce6a Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
* gfid-access: do chown() after creating the new entriesAmar Tumballi2013-12-102-8/+73
| | | | | | | | | | | | changing the 'frame->root->uid' on the fly is not a good idea as posix-acl xlator on brick process would fail the op. Change-Id: I996b43e4ce6efb04f52949976339dad6eb89bede Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 847839 Reviewed-on: http://review.gluster.org/5833 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* index: fix the order of initializationAnand Avati2013-12-041-3/+5
| | | | | | | | | | | | | thread spawning must be the last action. if not, failures after thread creation can set this->private to NULL and cause index_worker to segfault Change-Id: I71c85c25d7d6d1ed5f8d9c951db0262b8a3f1d90 BUG: 1034085 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6426 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/marker: Filter quota xattrs on file as wellPranith Kumar K2013-12-012-3/+70
| | | | | | | | | | | | | | | | | | | | | Problem: Quota contributions of a file/directory are tracked by quota xlator using xattrs on the file. Quota allows these xattrs to be healed as part of metadata self-heal. This leads to wrong quota calculations on this brick after self-heal because quota xattrs don't represent the actual contributions on the brick anymore. Fix: Don't let self-heal of this xattr happen as part of self-heal by filtering quota xattrs on file in listxattr. Change-Id: Iea68a116595ba271e58c6fdcc3dd21c7bb55ebb3 BUG: 1035576 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6374 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli, glusterd: More quota fixes ...Krutika Dhananjay2013-11-301-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... which may be grouped under the following categories: 1. Fix incorrect cli exit status for 'quota list' cmd 2. Print appropriate error message on quota parse errors in cli Authored by: Anuradha Talur <atalur@redhat.com> 3. glusterd: Improve quota validation during stage-op 4. Fix peer probe issues resulting from quota conf checksum mismatches 5. Enhancements to CLI output in the event of quota command failures Authored by: Kaushal Madappa <kmadappa@redhat.com> 7. Move aux mount location from /tmp to /var/run/gluster Authored by: Krishnan Parthasarathi <kparthas@redhat.com> 8. Fix performance issues in quota limit-usage Authored by: Krutika Dhananjay <kdhananj@redhat.com> Note: Some functions that were used in earlier version of quota, that aren't called anymore have been removed. Change-Id: I9d874f839ae5fdcfbe6d4f2d727eac091f27ac57 BUG: 969461 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6366 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/index : Creation of indices directory as soon as brick is up.Anuradha2013-11-281-0/+5
| | | | | | | | | | | | | | | | | Missing indices directory in the bricks leads to unwanted log messages. Therefore, indices directory needs to be created as soon as the brick comes up. This patch results in creation of indices/xattrop directory as required. Also includes a testcase to test the same. Change-Id: Ic2aedce00c6c1fb24929ca41f6085aed28709d2c BUG: 1034085 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/6343 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: instruct marker whenever it shouldn't do accountingRaghavendra G2013-11-261-40/+2
| | | | | | | | | | | | | | | | | | This is needed for two reasons: * since dht-linkfiles are internal, they shouldn't be accounted. * hardlink handling in marker is broken. link/unlink of hardlinks present in same directory can break marker accounting. Hence, if src and dst are in same directory in case of rename, dht - if it breaks rename into link/unlink operations - should instruct marker to not to do accounting. Change-Id: I9c9f7384569f75a2792f6450ee7a5279bf751ae7 BUG: 1022995 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6203 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/marker-quota: exclude dht-linkfiles from being accounted.Raghavendra G2013-11-261-2/+2
| | | | | | | | | Change-Id: I3239f5e8477664dcc04434e4d455ae447493a7ac BUG: 1022995 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6153 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/quota: make writes short when the entire write cannot fit into ↵Raghavendra G2013-11-262-10/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | available space. This patch aims to prevent creation of infinite zero byte sized files due to amount of storage available before exceeding quota limit being less than write sizes. Imagine x bytes of storage is available before we exceed quota limit and quota enforcer is receiving writes of size y and (y > x). In this scenario, if we run a shell script like: # for i in $(seq 1 10); do dd if=/dev/zero of=$i bs=y count=1; done Then, we would end up with 10 zero byte sized files, because we allow only complete writes and all writes will fail because of lack of space. However, creates succeed since a create itself will consume zero bytes. In this pattern of creates and writes, size of volume would never grow and x bytes of space will always be available and we can end up with an infinite number of zero byte sized files. Change-Id: Ice148d6a2207883e41759f7b0be73abaa3198b41 BUG: 1012216 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6035 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/quota: Improvements to quotaRaghavendra G2013-11-2610-984/+2947
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Two stages of quota enforcement is done: Soft and hard quota Upon reaching soft quota limit on the directory it logs/alerts in the quota daemon log (ie DEFAULT_LOG_DIR/quotad.log) and no more writes allowed after hard quota limit. After reaching the soft-limit the daemon alerts the user/admin repeatively for every 'alert-time', which is configurable. * Quota enforcer is moved to server-side. It takes care of enforcing quota. Since enforcer doesn't have the cluster view, it relies on another service called quota-aggregator. Aggregator, on query can return the size of a directory based on the cluster view. Enforcer is always loaded in the server graph and is by passed if the feature is not enabled. Options specific to enforcer: server-quota - Specifies whether the feature is on/off. It is used to by pass the quota if turned off. deem-statfs - If set to on, it takes quota limits into consideration while estimating fs size. (df command). The algorithm followed is, i. Adjust statvfs based on limit configured on root. ii. If limit is set on the inode passed, use size/limits on that inode to populate statvfs. Otherwise, use size/limits configured on root. iii. Upon statvfs, update the ctx->size on the inode. iv. Don't let DHT aggregate, instead take the maximum of the usages from the subvols of the DHT, since each of it contains the complete information. Enforcer also makes use of gfid-to-path conversion functionality to work correctly when a client like nfs predominently relies on nameless lookups. * Quota Aggregator acts as a thin client to provide cluster view Its a lightweight *gluster client* process with no mount point, started upon enabling quota or restarting the volume. This is a single process run on each brick, which can answer queries on all volumes in the cluster. Its volfile stored in GLUSTERD_DEFAULT_WORKING_DIR/quotad/quotad.vol. Credits: Raghavendra Bhat <rabhat@redhat.com> Varun Shastry <vshastry@redhat.com> Shishir Gowda <sgowda@redhat.com> Kruthika Dhananjay <kdhananj@redhat.com> Brian Foster <bfoster@redhat.com> Krishnan Parthasarathi <kparthas@redhat.com> Change-Id: Id1cb25b414951da34c665a55f77385d482e0f9de BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/5952 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/marker: quota friendly changesRaghavendra G2013-11-266-90/+387
| | | | | | | | | | | | | | | | | | | | | | | | | * handles renames on dht linkfiles correctly * nameless lookup friendly changes. uses gfid-to-path conversion functionality from storage/posix to build ancestry till root. * log message cleanup. * build inode contexts in readdirp * Accounting still not correct with hardlinks. Credits: ======== Vijay Bellur <vbellur@redhat.com> Raghavendra Bhat <rabhat@redhat.com> Change-Id: I415b6fbbc9691f5a38d9fd3c5d083a61e578bb81 BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/5953 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* NetBSD missing backtrace(3) portability fixEmmanuel Dreyfus2013-11-191-0/+2
| | | | | | | | | | | | Implement backtrace(3) and backtrace_symbols(3) which do not exist in NetBSD While there, remove duplicate #include <stdio.h> BUG: 764655 Change-Id: Iccd695765906e085c3f8fcb670506d4fea68fa39 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6285 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* zerofill: Change the type of len argument of glfs_zerofill() to off_tBharata B Rao2013-11-141-1/+1
| | | | | | | | | | | | | | glfs_zerofill() can be potentially called to zero-out entire file and hence allow for bigger value of length parameter. Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63 BUG: 1028673 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-on: http://review.gluster.org/6266 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/compress: Compression/DeCompression translatorPrashanth Pai2013-11-117-1/+1039
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * When a writev call occurs, the client compresses the data before sending it to server. On the server, compressed data is decompressed. Similarly, when a readv call occurs, the server compresses the data before sending it to client. On the client, the compressed data is decompressed. Thus the amount of data sent over the wire is minimized. * Compression/Decompression is done using Zlib library. * During normal operation, this is the format of data sent over wire : <compressed-data> + trailer(8) The trailer contains the CRC32 checksum and length of original uncompressed data. This is used for validation. HOW TO USE ---------- Turning on compression xlator: gluster volume set <vol_name> compress on Configurable options: gluster volume set <vol_name> compress.compression-level 8 gluster volume set <vol_name> compress.min-size 50 Change-Id: Ib7a66b6f1f70fe002b7c513588cdf75c69370805 BUG: 923540 Original-author : Venky Shankar <vshankar@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Prashanth Pai <nullpai@gmail.com> Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: http://review.gluster.org/3251 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/qemu-block: simplify coroutine model to use single synctask, ucontextBrian Foster2013-11-107-248/+72
| | | | | | | | | | | | | | | | | | | | | | | | The current coroutine model, mapping synctasks 1-1 with qemu internal Coroutines, has some unresolved raciness issues. This problem usually manifests as lifecycle mismatches between top-level (gluster created) synctasks and the subsequently created internal coroutines from that context. Qemu's internal queueing (and locking) can cause situations where the top-level synctask is destroyed before the internal scheduler has released references to memory, leading to use after free crashes and asserts. Simplify the coroutine model to use a single synctask as a coroutine processor and rely on the existing native ucontext coroutine implementation. The syncenv thread is donated to qemu and ensures a single top-level coroutine is processed at a time. Qemu now has complete control over coroutine scheduling. BUG: 986775 Change-Id: I38223479a608d80353128e390f243933fc946fd6 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/6110 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/qemu-block: add qemu backing image support (clone)Brian Foster2013-11-104-23/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add basic backing image support to the block-format mechanism. This is a functionality checkpoint that enables the raw mechanism required to support client driven "snapshot" and "clone" requests. This change enhances the block-format setxattr command to support an additional and optional backing image reference. For example: setxattr -n trusted.glusterfs.block-format -v "qcow2:10GB:<bimg>" ./newimage ... where <bimg> refers to the backing image for unallocated blocks in newimage. <bimg> can be provided in one of two formats: - a gfid string in the following format (assuming a valid gfid): <gfid:00000000-0000-0000-0000-000000000000> - or a filename that must be resident in the same directory as the new clone file being formatted. E.g., setxattr -n trusted.glusterfs.block-format -v "qcow2:10GB:baseimg" ./newimage This latter format is more restrictive, simply provided for convenience or until something more refined is available. This change makes no assumptions about the backing image file and affords no additional protection. It is up to the user/client to recognize the relationship between the files and manage them appropriately (i.e., no writes to the backing image, etc.). BUG: 986775 Change-Id: I7aff7bdc59b85a6459001a6bfeae4db6bf74f703 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5967 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterfs: zerofill supportM. Mohan Kumar2013-11-101-0/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in the specified range. This fop will be useful when a whole file needs to be initialized with zero (could be useful for zero filled VM disk image provisioning or during scrubbing of VM disk images). Client/application can issue this FOP for zeroing out. Gluster server will zero out required range of bytes ie server offloaded zeroing. In the absence of this fop, client/application has to repetitively issue write (zero) fop to the server, which is very inefficient method because of the overheads involved in RPC calls and acknowledgements. WRITESAME is a SCSI T10 command that takes a block of data as input and writes the same data to other blocks and this write is handled completely within the storage and hence is known as offload . Linux ,now has support for SCSI WRITESAME command which is exposed to the user in the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to implement this fop. Thus zeroing out operations can be completely offloaded to the storage device , making it highly efficient. The fop takes two arguments offset and size. It zeroes out 'size' number of bytes in an opened file starting from 'offset' position. This patch adds zerofill support to the following areas: - libglusterfs - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi Client applications can exloit this fop by using glfs_zerofill introduced in libgfapi.FUSE support to this fop has not been added as there is no system call for this fop. Changes from previous version 3: * Removed redundant memory failure log messages Changes from previous version 2: * Rebased and fixed build error Changes from previous version 1: * Rebased for latest master TODO : * Add zerofill support to trace xlator * Expose zerofill capability as part of gluster volume info Here is a performance comparison of server offloaded zeofill vs zeroing out using repeated writes. [root@llmvm02 remote]# time ./offloaded aakash-test log 20 real 3m34.155s user 0m0.018s sys 0m0.040s [root@llmvm02 remote]# time ./manually aakash-test log 20 real 4m23.043s user 0m2.197s sys 0m14.457s [root@llmvm02 remote]# time ./offloaded aakash-test log 25; real 4m28.363s user 0m0.021s sys 0m0.025s [root@llmvm02 remote]# time ./manually aakash-test log 25 real 5m34.278s user 0m2.957s sys 0m18.808s The argument log is a file which we want to set for logging purpose and the third argument is size in GB . As we can see there is a performance improvement of around 20% with this fop. Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18 BUG: 1028673 Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5327 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* client_t: phase 2, refactor server_ctx and locks_ctx outKaleb S. KEITHLEY2013-10-317-117/+423
| | | | | | | | | | | | | | remove server_ctx and locks_ctx from client_ctx directly and store as into discrete entities in the scratch_ctx hooking up dump will be in phase 3 BUG: 849630 Change-Id: I94cea328326db236cdfdf306cb381e4d58f58d4c Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/5678 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/gfid-access: Handle inode remap when parent inode is NULLPranith Kumar K2013-10-311-7/+2
| | | | | | | | | | Change-Id: Ic3f9d30d75df0bbbdf8fe28446fabe62d90fa854 BUG: 1024666 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6194 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Add monotonic clocking counter for timer threadHarshavardhana2013-10-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gettimeofday() returns the current wall clock time and timezone. Using these functions in order to measure the passage of time (how long an operation took) therefore seems like a no-brainer. This time suffer's from some limitations: a. They have a low resolution: “High-performance” timing by definition, requires clock resolutions into the microseconds or better. b. They can jump forwards and backwards in time: Computer clocks all tick at slightly different rates, which causes the time to drift. Most systems have NTP enabled which periodically adjusts the system clock to keep them in sync with “actual” time. The adjustment can cause the clock to suddenly jump forward (artificially inflating your timing numbers) or jump backwards (causing your timing calculations to go negative or hugely positive). In such cases timer thread could go into an infinite loop. From 'man gettimeofday': ---------- .. .. The time returned by gettimeofday() is affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes the system time). If you need a monotonically increasing clock, see clock_gettime(2). .. .. ---------- Rationale: For calculating interval timing for Timer thread, all that’s needed should be clock as a simple counter that increments at a stable rate. This is necessary to avoid the jumps which are caused by using "wall time", this counter must be monotonic that can never “tick” backwards, ever. Change-Id: I701d31e71a85a73d21a6c5cd15583e7a5a645eeb BUG: 1017993 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/6070 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: [Feature] Command implementation to get heal-countVenkatesh Somyajulu2013-10-142-13/+316
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently to know the number of files to be healed, either user has to go to backend and check the number of entries present in indices/xattrop directory. But if a volume consists of large number of bricks, going to each backend and counting the number of entries is a time-taking task. Otherwise user can give gluster volume heal vol-name info command but with this approach if no. of entries are very hugh in the indices/ xattrop directory, it will comsume time. So as a feature, new command is implemented. Command 1: gluster volume heal vn statistics heal-count This command will get the number of entries present in every brick of a volume. The output displays only entries count. Command 2: gluster volume heal vn statistics heal-count replica 192.168.122.1:/home/user/brickname Here if we are concerned with just one replica. So providing any one of the brick of a replica will get the number of entries to be healed for that replica only. Example: Replicate volume with replica count 2. Backend status: -------------- [root@dhcp-0-17 xattrop]# ls -lia | wc -l 1918 NOTE: Out of 1918, 2 entries are <xattrop-gfid> dummy entries so actual no. of entries to be healed are 1916. [root@dhcp-0-17 xattrop]# pwd /home/user/2ty/.glusterfs/indices/xattrop Command output: -------------- Gathering count of entries to be healed on volume volume3 has been successful Brick 192.168.122.1:/home/user/22iu Status: Brick is Not connected Entries count is not available Brick 192.168.122.1:/home/user/2ty Number of entries: 1916 Change-Id: I72452f3de50502dc898076ec74d434d9e77fd290 BUG: 1015990 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/6044 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/qemu-block: invoke bdrv_init() only onceBrian Foster2013-10-091-1/+5
| | | | | | | | | | | | | | | | bdrv_init() is intended to be invoked only once. If invoked again after initialization (i.e., due to graph changes), the block driver registration code can reinsert entries into bdrv_drivers, effectively corrupting the NULL terminated linked list. This manifests as an infinite loop for callers attempt to traverse the list (to locate a driver). BUG: 986775 Change-Id: I2f95bda3c753dece4648cdad009d0e2a2d16f644 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/6021 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/changelog : Improvement in changelog "encoding-change".ajha2013-09-294-17/+26
| | | | | | | | | | | | | | | | | change in encoding method of changelog was critical section for "fop dispatch thread", "roll-over thread" and "reconfigure dispatch thread". In this patch the "encoding-method" is changed by the reconfigure dispatch thread lazily during handle_change, which solves the concurrency among the racing threads. BUG: 1002940 Change-Id: I78c3e8887efa46d0fcc60755cdf4243031cfa3eb Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/5844 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net>
* core: block unused signals in created threadsAnand Avati2013-09-253-9/+9
| | | | | | | | | | | | | | | Block all signal except those which are set for explicit handling in glusterfs_signals_setup(). Since thread spawning code in libglusterfs and xlators can get called from application threads when used through libgfapi, it is necessary to do this blocking. Change-Id: Ia320f80521a83d2edcda50b9ad414583a0175281 BUG: 1011662 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5995 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* qemu-block: support readdirpBrian Foster2013-09-171-0/+58
| | | | | | | | | | | | | | Support the readdirp fop in qemu-block to ensure that image files are handled correctly when readdirp is enabled. E.g., without readdirp support, incorrect stat data for formatted files can be reported back (and cached) via the client. BUG: 986775 Change-Id: Ibc4bd0b42548953ebe30832f3d853bb68095f0ac Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5946 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* qemu-block: fix building from distribution tarball when glib2-devel is installedNiels de Vos2013-09-121-0/+8
| | | | | | | | | | | | | | | | | | | | | | Building RPMs from a 'make dist' tarball fails when qemu-block is enabled. Enabling is done automatically when the glib2 development files are available (enabled by ./configure). Manual building with: $ ./autogen.sh && ./configure && make dist && rpmbuild -ta *.tar.gz Building in mock works fine, glib2-devel is not installed by default so the qemu-block xlator gets disabled. This change also adds glib2-devel to the BuildRequires in the glusterfs.spec file, causing the qemu-block xlator to be built by default, and included in the glusterfs RPM. Change-Id: Ibb73628772586d9e07bbfde7a8ff2fc973489086 BUG: 986775 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/5896 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gfid-access: Error logs for ga_newfile_parse_argsAvra Sengupta2013-09-051-11/+50
| | | | | | | | | | Change-Id: I7aab98a70793bee936272f0b795f4c22c3733dd2 BUG: 1001055 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5705 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>