summaryrefslogtreecommitdiffstats
path: root/xlators/storage/posix/src/posix.c
Commit message (Collapse)AuthorAgeFilesLines
...
* posix: Fix volume will not start if brick has no volume-id attributeVenkatesh Somyajulu2012-11-201-7/+4
| | | | | | | | | | | | | | | | | | | | | | | Problem: If the extended attribute (trusted.glusterfs.volume-id) of a brick is absent and <gluster volume start volume-name> command is executed then curretly volume-id from the volume file will be set as an extended attribute of the brick and volume will get started. But if setup is such that brick is used as a mount point and before executing the <gluster volume start volume-name> command, nothing is mounted on the brick then all the file operations will take place at the brick but actual intention of the brick is to be used as mount point only. FIX: Do not start the volume if extended attribute (trusted.glusterfs.volume-id) is set absent. Change-Id: Id2462d87d6087e97e0b8831512fdbc3595f7078b BUG: 860297 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4202 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: Make rchecksum O_DIRECT friendlyPranith Kumar K2012-11-201-23/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When posix-aio is enabled to perform aio fd is set with O_DIRECT whenever possible in read, writev fops. Rchecksum does not take this into account. If either offset/size/memory-buf passed to pread in rchecksum fop is not aligned, pread fails with EINVAL. Fix: Before doing pread necessary O_DIRECT manipulation is done when aio is enabled. Memory buffer passed to pread is now page-aligned. Test: 1) Create replica volume with aio enabled. 2) dd if=/dev/urandom of=a bs=1M count=1 3) kill one of the bricks in the replica pair 4) dd if=/dev/urandom of=a bs=1M count=1 5) bring back the brick. Self-heal succeeds after the change. The test above checks both rchecksum, writev fops that were changed in this patch. Change-Id: I186099a2854d4864c5b48086ab7bc5f1a7b27313 BUG: 866459 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4134 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: remove dependency on loc->path in posix_lookup()Amar Tumballi2012-10-111-1/+0
| | | | | | | | | Change-Id: I0a3bc8650d9ff83977be696aa5caf9c7570197fd Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 781318 Reviewed-on: http://review.gluster.org/3997 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* logging: log ENOENT errors in DEBUG mode instead of ERROR or INFORaghavendra Bhat2012-09-171-1/+2
| | | | | | | | | | Change-Id: I0a43769223991e4ad5206b4382d737a0c3557bf3 BUG: 851953 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/3934 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: Option to set brick(of a volume)'s root dir's uid/gidKrishnan Parthasarathi2012-09-141-5/+43
| | | | | | | | | | | | | | | | | | CLI --- gluster volume set VOLNAME owner-uid uid gluster volume set VOLNAME owner-gid gid where uid,gid are the owner's user id and group id respectively that would be set on the root of all brick (backend) fs. TODO: uid/gid should not be -1. Today we don't validate that in CLI. Change-Id: Ib6a2fb5e404691c5fe105a89faaeff3e1ab72e91 BUG: 853842 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/3891 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* All: License message changeVarun Shastry2012-09-131-7/+6
| | | | | | | | | | | | License message changed for server-side, dual license GPLV2 and LGPLv3+. Change-Id: Ia9e53061b9d2df3b3ef3bc9778dceff77db46a09 BUG: 852318 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3940 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: Make posix_fremovexattr anon fd friendly.Pranith Kumar K2012-09-061-4/+1
| | | | | | | | | | | | | | | | | | | | | Problem: For anonymous fds posix_fremovexattr fails to work because the open never happens and the fd-ctx is not set with the fd-number. Fix: Use posix_fd_ctx_get which opens and sets the fd-number in the fd-ctx for anonymous fds. Tests: Added a syncop call in glustershd to test this change and it worked fine. Change-Id: I9629190a87eb27a7a1578e4fe732a5eb1248f30c BUG: 854331 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3903 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* posix: adjust new xattrops to new dict APICsaba Henk2012-09-061-4/+4
| | | | | | | | | | | | | | | | - http://review.gluster.org/3909 introduces new xattrops - http://review.gluster.org/3829 changes the dict API The new xattrops has been written against the old dict API, but been committed after the dict API change, resulting in a build error. Change-Id: I10b9acc79927f3505b5e13116653fb9a584ffd31 BUG: 850917 Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: http://review.gluster.org/3915 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: Add or_array/and_array op for xattropPranith Kumar K2012-09-061-1/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: For set/reset of outcast (ALL changelog bits set per transaction type i.e. data/mdata/entry) from afr the capability of OR/AND in xattrop is needed in posix. Otherwise marking outcast will only be possible in self-heals where appropriate locks are held so that no other transaction is in progress, so exact number can be computed with which when XATTROP_ADD happens all bits will be set for that changelog. Fix: Implemented new xattrop-op OR_ARRAY, AND_ARRAY. Made checks in __add_array to work well with __or_array. Tests: From Afr code made an OR_ARRAY with ALL bits set and it reflected on the changelog xattrs. changelog incrementing did not have any effects on the all-set changelog. From Afr code made an AND_ARRAY with 0 and it reflected in the changelog xattrs. Change-Id: Ie89c78a43d05789e3a8fa03d2422b52083ae80b9 BUG: 847671 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3909 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* libglusterfs/dict: make 'dict_t' a opaque objectAmar Tumballi2012-09-061-46/+42
| | | | | | | | | | | | | | | * ie, don't dereference dict_t pointer, instead use APIs everywhere * other than dict_t only 'data_t' should be the valid export from dict.h * added 'dict_foreach_fnmatch()' API * changed dict_lookup() to use data_t, instead of data_pair_t Change-Id: I400bb0dd55519a7c5d2a107e67c8e7a7207228dc Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 850917 Reviewed-on: http://review.gluster.org/3829 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* All: License message changeVarun Shastry2012-08-281-14/+5
| | | | | | | | | | | | | | | | | | The license message is changed to Copyright (c) 2008-2012 Red Hat, Inc. <http://www.redhat.com> This file is part of GlusterFS. This file is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. Change-Id: I07d2b63ed5fbbbd1884f1e74f2dd56013d15b0f4 BUG: 852318 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3858 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Optimize readdirp calls in DHTshishir gowda2012-08-131-2/+32
| | | | | | | | | | | | | | | | | Bring in option which is supported by posix xlator to filter out directory's entries from being returned. DHT would now request non-first subvols to filter out directory entries. dht xlator-option readdir-optimize will enable this optimization Change-Id: I35224bc81c9657f54f952efac02790276c35ded5 BUG: 838199 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.com/3772 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: use the size returned by getxattr to allocate memoryRaghavendra Bhat2012-07-171-2/+2
| | | | | | | | | Change-Id: I71c234b12a1d16405e508b715932022fdce346f0 BUG: 838195 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.com/3681 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: use ssize_t variable to get the return value of getxattrRaghavendra Bhat2012-07-171-36/+47
| | | | | | | | | | Change-Id: Ida065e108a1d2a61b134fb847e8c4981b46fc3c6 BUG: 838195 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.com/3673 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: implement native linux AIO supportAnand Avati2012-07-141-1/+47
| | | | | | | | | | | Configurable via cli with "storage.linux-aio" settable option Change-Id: I9929e0d6fc1bbc2a0fe1fb67bfc8d15d8a483d3f BUG: 837495 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.com/3627 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* remove useless if-before-free (and free-like) functionsJim Meyering2012-07-131-8/+4
| | | | | | | | | | | | See comments in http://bugzilla.redhat.com/839925 for the code to perform this change. Signed-off-by: Jim Meyering <meyering@redhat.com> BUG: 839925 Change-Id: I10e4ecff16c3749fe17c2831c516737e08a3205a Reviewed-on: http://review.gluster.com/3661 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: handle getxattr failures gracefullyRaghavendra Bhat2012-07-111-5/+50
| | | | | | | | | | | | | | Use proper variable types for getting return value of getxattr calls, which otherwise can lead to segfaulting of processes or page allocation failures in the kernel. Change-Id: I62ab5d6c378447090c19846f03298c3afc8863ba BUG: 838195 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.com/3640 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/stripe: implement the coalesce stripe file formatBrian Foster2012-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The coalesce file format for cluster/stripe condenses the striped files to a contiguous layout. The elimination of holes in striped files eliminates space wasted via local filesystem preallocation heuristics and significantly improves read performance. Coalesce mode is implemented with a new 'coalesce' xlator option, which is user-configurable and disabled by default. The format of newly created files is marked with a new 'stripe-coalesce' xattr. Cluster/stripe handles/preserves the format of files regardless of the current mode of operation (i.e., a volume can simultaneously consist of coalesced and non-coalesced files). Files without the stripe-coalesce attribute are assumed to have the traditional format to provide backward compatibility. extras/stripe-merge: support traditional and coalesce stripe formats Update the stripe-merge recovery tool to handle the traditional and coalesced file formats. The format of the file is detected automatically (and verified) via the stripe-coalesce attributes. BUG: 801887 Change-Id: I682f0b4e819f496ddb68c9a01c4de4688280fdf8 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.com/3282 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: Prevent gfid handle leaksPranith Kumar K2012-05-311-4/+6
| | | | | | | | | | | | | | | | | | | | The case which can lead to gfid handle leaks: Self-heal removes directory '/d' with 10 files in it, in brick b1. This dir is renamed to <landfill>/<hashval of '<brick-path>/d'> by posix. Before the janitor thread could remove the directory, self-heal could remove another directory with same path '/d'. Then again the rename to same path is done by posix as before. The gfid-handles of the old '/d', 10 files in it are not unlinked. To prevent such problems, rename the directory to be removed to <landfill>/<gfid-str>. Change-Id: Iad13708e1ebcc5222b64c058aa9a2d372e1bfa5b BUG: 811970 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3159 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: Move landfill inside .glusterfsPranith Kumar K2012-05-311-18/+8
| | | | | | | | | Change-Id: Ia2944f891dd62e72f3c79678c3a1fed389854a90 BUG: 811970 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3158 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: readdirp enhancementsAnand Avati2012-05-281-45/+61
| | | | | | | | | | | | - avoid multiple calls to posix_istat(). use cheaper posix_pstat() - code re-org Change-Id: I4a2e32626ade49b7d18158952849c6fe7bd6875c BUG: 816140 Reviewed-on: http://review.gluster.com/3460 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix,fuse-bridge: fill the d_type attribute in READDIRP repliesNiels de Vos2012-05-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | d_type should contain the type of the the dir-entry (man 3 readdir). Currently the d_type is always set to DT_UNKNOWN (0). The POSIX standard readdir() returns a 'struct dirent' on both Linux and NetBSD with the d_type attribute. Commit bb315cb180c3547218b5ed581d38e76aec74cf94 removed setting d_type in xlators/mount/fuse/src/fuse-bridge.c. This was using d_type_from_stat(). The stat() seems to have been removed for performance reasons. Instead of removing d_type completely, dirent->d_type could have been used. Therefore the fuse-brige can now add "fde->type = entry->d_type" back into fuse_readdir_cbk() without causing the previous performance impact. Change-Id: I4514bbc0acceb33d09c3cf50bda51e34d953efca BUG: 817785 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.com/3256 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* iobuf: option to provide larger size of buffersAmar Tumballi2012-05-031-4/+1
| | | | | | | | | | | | | provide an option to failover to standard allocation if iobuf of required size doesn't exists. this can be achieved by keeping an arena dedicated for all the out of boundary allocations. Change-Id: I41a2bd7d353dc7bcb2e1a6e4b41735afe9865975 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 812784 Reviewed-on: http://review.gluster.com/3136 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* log cleanup: in setxattr() pathAmar Tumballi2012-05-021-2/+18
| | | | | | | | | | | | | | | | | * in posix we log occassionally if errno is ENOTSUP, added a suggestion to mount with 'user_xattr' option. * changed server's *etxattr_cbk to log ENOTSUP in debug level. * changed client's *etxattr_cbk to log ENOTSUP in debug level. Change-Id: Icd604050aaa68546011f2c950ecd7883ac6ee820 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 811957 Reviewed-on: http://review.gluster.com/3140 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* storage/posix: fix illegal memory access in fgetxattr()Amar Tumballi2012-04-271-3/+7
| | | | | | | | | | | | | | we were not checking for the return value of the fgetxattr(key), and used to continue with the allocation even if size was -1, leading to wrong memory access. Change-Id: Ib5cf2e74fee95bc919b12efe89fed5cd25807efd Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 815346 Reviewed-on: http://review.gluster.com/3236 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: Fix leak of dict in getxattr calls.shishir gowda2012-04-271-3/+1
| | | | | | | | | | | | | | | | | get_new_dict does not take a ref. Ref was taken only when any data was added to the dict. But in the out tag: we call explicit unref, which would move the ref count to -1, if it was a unsuccessful call. unref destroys the dict only if ref == 0. Change-Id: Ie08c301237c2042daf90a7ef25569e3b06e3e1e9 BUG: 816870 Signed-off-by: shishir gowda <shishirng@gluster.com> Reviewed-on: http://review.gluster.com/3240 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: coverity issues fixedAmar Tumballi2012-04-231-25/+22
| | | | | | | | | | | | this is not a complete set of issues getting fixed. Will address other issues in another patch. Change-Id: Ib01c7b11b205078cc4d0b3f11610751e32d14b69 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 789278 Reviewed-on: http://review.gluster.com/3145 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* storage/posix: Don't allow mkdir() on HIDDEN_DIRECTORYKaushal M2012-04-201-0/+12
| | | | | | | | | | Change-Id: Iecbd71d13ee8a492a99689674be99b4a451593db BUG: 788150 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.com/3200 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* storage/posix: Handle gfid-less lookupPranith Kumar K2012-04-121-3/+9
| | | | | | | | | Change-Id: I4605dbb1dd8bf8e26de7f253e54a7f4840c8a8be BUG: 795355 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3128 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Fix compiler warnings and typos from Debian build.Jeff Darcy2012-04-101-1/+1
| | | | | | | | | | | | | Mostly to do with "-Werror=format-security" being buggy, but while we're here we might as well fix some typos and such. Credit goes to Patrick Matthäi <pmatthaei@debian.org> for pointing these out. Change-Id: Ia32d1111d7c10b1f213df85d86b17a1326248ffd BUG: 811387 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.com/3117 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: Avoid excessive logging in posix.Mohammed Junaid2012-04-091-1/+4
| | | | | | | | | | | | | | When quota or gsyncd is enabled, the marker translator performs setxattr on files/directories. If the file/directory is deleted before setxattr, posix gets an error when it does setxattr and logs it. But its not an error for marker and it handles the case gracefully. Hence, avoid logging for these keys. Change-Id: Ic614777399497be92ed1c2b4718d46adfb639d96 BUG: 765498 Signed-off-by: Mohammed Junaid <junaid@redhat.com> Reviewed-on: http://review.gluster.com/3105 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Replace GPLV3 MD5 with OpenSSL MD5Kaleb KEITHLEY2012-04-041-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ric asked me to look at replacing the GPL licensed MD5 code with something better, i.e. perhaps faster, and with a less restrictive license, etc. So I took a couple hour holiday from working on wrapping up the client_t and did this. OpenSSL (nee SSLeay) is released under the OpenSSL license, a BSD/MIT style license. OpenSSL (libcrypto.so) is used on Linux, OS X and *BSD, Open Solaris, etc. IOW it's universally available on the platforms we care about. It's written by Eric Young (eay), now at EMC/RSA, and I can say from experience that the OpenSSL implementation of MD5 (at least) is every bit as fast as RSA's proprietary implementation (primarily because the implementations are very, very similar.) The last time I surveyed MD5 implementations I found they're all pretty much the same speed. I changed the APIs (and ABIs) for the strong and weak checksums. Strictly speaking I didn't need to do that. They're only called on short strings of data, i.e. pathnames, so using int32_t and uint32_t is ostensibly okay. My change is arguably a better, more general API for this sort of thing. It's also what bit me when gerrit/jenkins validation failed due to glusterfs segv-ing. (I didn't pay close enough attention to the implementation of the weak checksum. But it forced me to learn what gerrit/jenkins are doing and going forward I can do better testing before submitting to gerrit.) Now resubmitting with a BZ Change-Id: I545fade1604e74fc68399894550229bd57a5e0df BUG: 807718 Signed-off-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.com/3019 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* core: adding extra data for fopsAmar Tumballi2012-03-221-104/+109
| | | | | | | | | | | | | with this change, the xlator APIs will have a dictionary as extra argument, which is passed between all the layers. This can be utilized for overloading in some of the operations. Change-Id: I58a8186b3ef647650280e63f3e5e9b9de7827b40 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 782265 Reviewed-on: http://review.gluster.com/2960 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: don't allow rmdir()/lookup() on HIDDEN_DIRECTORYAmar Tumballi2012-03-141-1/+26
| | | | | | | | | | | so that we won't even have a GFID set on the GFID dir itself. Change-Id: I65be7d675a308f51f4c62a86499341412b20c47f Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 802726 Reviewed-on: http://review.gluster.com/2936 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* posix_fill_readdir: Using strcmp to compare GF_HIDDEN_PATH with ↵M S Vishwanath Bhat2012-03-111-2/+1
| | | | | | | | | | | entry->d_name instead of strncmp. Change-Id: I29b6fc81213e52a697ed96559c3216c5512799ed BUG: 802005 Signed-off-by: M S Vishwanath Bhat <vishwanath@gluster.com> Reviewed-on: http://review.gluster.com/2910 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* glusterd/rebalance: Bring in support for parallel rebalanceshishirng2012-03-071-1/+1
| | | | | | | | | | | | | | | | | This patch, enables rebalance processes to be started on all nodes where the volume is spread across (1 process per node) node-uuid xattr identifies which node takes ownership of the task to migrate the file. The model employed is push (src pushes to dst) Change-Id: Ieacd46a6216cf6ded841bbaebd10cfaea51c16d6 BUG: 763844 Signed-off-by: shishirng <shishirng@gluster.com> Reviewed-on: http://review.gluster.com/2873 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* storage/posix: In lookup check for buf->gfid being NULLshishir gowda2012-02-291-0/+6
| | | | | | | | | | | | | | There are few cases where create and lookup race. Lookup ends up getting a valid struct iatt, but with no gfid set. We need to check for gfid being 0, and handle it as an error. Signed-off-by: shishir gowda <shishirng@gluster.com> Change-Id: I36ae1978b325aff964cbc3b24730c1e993666267 BUG: 797167 Reviewed-on: http://review.gluster.com/2832 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* glusterfsd: unref the dict and free the memory to avoid memleakRaghavendra Bhat2012-02-271-0/+1
| | | | | | | | | | Change-Id: Ib7a1f8cbab039fefb73dc35560a035d5688b0e32 BUG: 796186 Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com> Reviewed-on: http://review.gluster.com/2808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Introduce new extended attribute: node-uuidVenky Shankar2012-02-221-34/+87
| | | | | | | | | | | | | | | | Request for trusted.glusterfs.node-uuid returns pathinfo like string but containing the UUID of glusterd instead of the backend path for the requested file. This info is benificial for tasks like parallel rebalance that will make use of the UUID for data locality. Change-Id: I766a09cc4a5f63aebd11c73107924a1b29242dcf BUG: 772610 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.com/2614 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <shishirng@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* storage/posix: Add xattr for gfid2pathPranith Kumar K2012-02-201-0/+19
| | | | | | | | | Change-Id: I1fe987d255bf50e8433043749b482b67554a0ac3 BUG: 763820 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2774 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/dht: Support for hardlink rebalance when decommissioningshishir gowda2012-02-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The support for hardlink rebalance is only available for decommissioning of a node. this can be triggered in two ways 1. remove-brick start 2. if decommission node value is set in vol file, then a normal rebalance command The way we handle it is- if (nlink > 1) do * if src file doesnt have linkto xattr * mark src's linkto to the dst * else * perform a link on the dst * do a look up * if nlinks = dst.nlinks * migrate data * else * continue crawling done Signed-off-by: shishir gowda <shishirng@gluster.com> Change-Id: If43b5524b872fd1413e9f7aa7f436cb244e30d8d BUG: 763844 Reviewed-on: http://review.gluster.com/2737 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* posix: handle some internal behavior in posix_mknod()Amar Tumballi2012-02-161-0/+20
| | | | | | | | | | | | | | | | | | | | assume a case of link() systemcall, which is handled in distribute by creating a 'linkfile' in hashed subvolume, if the 'oldloc' is present in different subvolume. we have same 'gfid' for the linkfile as that of file for consistency. Now, a file with multiple hardlinks, we may end up with 'hardlinked' linkfiles. dht create linkfile using 'mknod()' fop, and as now posix_mknod() is not equipped to handle this situation. this patch fixes the situation by looking at the 'internal' key set in the dictionary to differentiate the call which originates from inside with regular system calls. Change-Id: Ibff7c31f8e0c8bdae035c705c93a295f080ff985 BUG: 763844 Signed-off-by: Amar Tumballi <amar@gluster.com> Reviewed-on: http://review.gluster.com/2755 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: reset op_ret to -1 after call to MAKE_INODE_HANDLEshishir gowda2012-02-161-0/+5
| | | | | | | | | | | | MAKE_INODE_HANDLE uses op_ret. We do not reset it to -1, and in few instances we jump to label out, where we unwind with op_ret. Change-Id: Iac4d9f250f5253b3ce0cd91cc385168247efd4a8 BUG: 788998 Signed-off-by: shishir gowda <shishirng@gluster.com> Reviewed-on: http://review.gluster.com/2759 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* core: add an extra flag to readv()/writev() APIAmar Tumballi2012-02-141-4/+4
| | | | | | | | | | | | needed to implement a proper handling of open flag alterations using fcntl() on fd. Change-Id: Ic280d5db6f1dc0418d5c439abb8db1d3ac21ced0 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 782265 Reviewed-on: http://review.gluster.com/2723 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: Fix rename gfid handle unsetPranith Kumar K2012-01-291-4/+4
| | | | | | | | | Change-Id: I365ef264056691914ad5bd620d8150f8b71ec887 BUG: 785524 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2698 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: add 'fremovexattr()' fopAmar Tumballi2012-01-251-0/+54
| | | | | | | | | | | so operations can be done on fd for extended attribute removal Change-Id: Ie026f1b53793aeb4ae33e96ea5408c7a97f34bf6 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 766571 Reviewed-on: http://review.gluster.com/778 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: get xattrs also as part of readdirpAmar Tumballi2012-01-251-24/+56
| | | | | | | | | | | | | readdirp_req() call sends a dict_t * as an argument, which contains all the xattr keys for which the entries got in readdirp_rsp() are having xattr value filled dictionary. Change-Id: I8b7e1290740ea3e884e67d19156ce849227167c0 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 765785 Reviewed-on: http://review.gluster.com/771 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* storage/posix: Pass correct size to sys_lgetxattrroot2012-01-231-1/+1
| | | | | | | | | | | | | We were passing op_ret (0), instead of size variable obtained by previous sys_lgetxattr to determine the size Signed-off-by: root <shishirng@gluster.com> Change-Id: I886dedc2ab752ac1feabe7a79725ea5f069d6865 BUG: 783916 Reviewed-on: http://review.gluster.com/2676 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rahul C S <rahulcs@redhat.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* core: GFID filehandle based backend and anonymous FDsAnand Avati2012-01-201-438/+292
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. What -------- This change introduces an infrastructure change in the filesystem which lets filesystem operation address objects (inodes) just by its GFID. Thus far GFID has been a unique identifier of a user-visible inode. But in terms of addressability the only mechanism thus far has been the backend filesystem path, which could be derived from the GFID only if it was cached in the inode table along with the entire set of dentry ancestry leading up to the root. This change essentially decouples addressability from the namespace. It is no more necessary to be aware of the parent directory to address a file or directory. 2. Why ------- The biggest use case for such a feature is NFS for generating persistent filehandles. So far the technique for generating filehandles in NFS has been to encode path components so that the appropriate inode_t can be repopulated into the inode table by means of a recursive lookup of each component top-down. Another use case is the ability to perform more intelligent self-healing and rebalancing of inodes with hardlinks and also to detect renames. A derived feature from GFID filehandles is anonymous FDs. An anonymous FD is an internal USABLE "fd_t" which does not map to a user opened file descriptor or to an internal ->open()'d fd. The ability to address a file by the GFID eliminates the need to have a persistent ->open()'d fd for the purpose of avoiding the namespace. This improves NFS read/write performance significantly eliminating open/close calls and also fixes some of today's limitations (like keeping an FD open longer than necessary resulting in disk space leakage) 3. How ------- At each storage/posix translator level, every file is hardlinked inside a hidden .glusterfs directory (under the top level export) with the name as the ascii-encoded standard UUID format string. For reasons of performance and scalability there is a two-tier classification of those hardlinks under directories with the initial parts of the UUID string as the directory names. For directories (which cannot be hardlinked), the approach is to use a symlink which dereferences the parent GFID path along with basename of the directory. The parent GFID dereference will in turn be a dereference of the grandparent with the parent's basename, and so on recursively up to the root export. 4. Development --------------- 4a. To leverage the ability to address an inode by its GFID, the technique is to perform a "nameless lookup". This means, to populate a loc_t structure as: loc_t { pargfid: NULL parent: NULL name: NULL path: NULL gfid: GFID to be looked up [out parameter] inode: inode_new () result [in parameter] } and performing such lookup will return in its callback an inode_t populated with the right contexts and a struct iatt which can be used to perform an inode_link () on the inode (without a parent and basename). The inode will now be hashed and linked in the inode table and findable via inode_find(). A fundamental change moving forward is that the primary fields in a loc_t structure are now going to be (pargfid, name) and (gfid) depending on the kind of FOP. So far path had been the primary field for operations. The remaining fields only serve as hints/helpers. 4b. If read/write is to be performed on an inode_t, the approach so far has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost in performance in the inbuilt NFS server. 5. Misc ------- The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent with the rest of the codebase. Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f BUG: 781318 Reviewed-on: http://review.gluster.com/669 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* Avoid setting dict when size is -1Rahul C S2012-01-031-0/+5
| | | | | | | | | | | | | when lgetxattr fails and returns size as -1, we still try to set the dict. Instead it should set proper errno & exit. Change-Id: I282dc0765e562bd9bbcf852453cd3b72d918b269 BUG: 771313 Signed-off-by: Rahul C S <rahulcs@redhat.com> Reviewed-on: http://review.gluster.com/2555 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>