summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
* common-utils: Move glusterd_is_service_running() to common-utilsKrutika Dhananjay2013-08-124-46/+9
| | | | | Change-Id: I96cbe03511cbecb112418da82c44c00fbab74ba3 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* Merge "features/marker: Fix an incorrect NULL check" into quotaVijay Bellur2013-08-111-3/+4
|\
| * features/marker: Fix an incorrect NULL checkVijay Bellur2013-08-111-3/+4
| | | | | | | | | | | | Change-Id: If9bb12b352af5a691bd17fc51f0273685ecb12e8 BUG: 969461 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* | features/marker: reduce severity of an annoying log.Vijay Bellur2013-08-111-2/+2
|/ | | | | Change-Id: If65129812b10afc19a22b2b0c468b53043bde1db BUG: 969461
* Merge "features/marker: refactor marker quota for better readability (part ↵Vijay Bellur2013-08-102-23/+38
|\ | | | | | | 1)." into quota
| * features/marker: refactor marker quota for better readability (part 1).Vijay Bellur2013-08-102-23/+38
| | | | | | | | | | Change-Id: I1133c5ca24f2b395103d470bde77be966d17d938 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* | Merge "features/quota: Add support for statedump" into quotaVijay Bellur2013-08-102-0/+41
|\ \ | |/ |/|
| * features/quota: Add support for statedumpVijay Bellur2013-08-102-0/+41
| | | | | | | | | | | | | | | | - dumps members of quota_priv_t. - also added validation count to keep track of number of validations done. Change-Id: I998fcccacf4bd7c61ead9ca9a489e0dc0e73763a Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* | quota: saner defaults, min and max values for timeoutsKrishnan Parthasarathi2013-08-101-4/+4
|/ | | | | | | | Also changed default soft-limit percentage to 80% of configured quota limit. Change-Id: Ia07b569216189a6e3bedb5cdbf8ffeb9f7739444 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd, quotad: volume-id option fixupsBrian Foster2013-08-092-2/+21
| | | | | | | | | | | A few little hacks to set the volume id on the quota server and a mapping option on quotad to map the volume name to the uuid passed via the lookup request. Change-Id: Ic151acb18ed29d2ee4ae5d1bc6841ae4a4de176a Original-author: Brian Foster <bfoster@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
* features/quota: Fix compilation warningsKrutika Dhananjay2013-08-092-5/+4
| | | | | | | | | | | | | | | | | | The following are the compilation warnings I encountered: ----------------------------------- In file included from quota.c:12:0: quota.h:190:1: error: 'packed' attribute ignored [-Werror=attributes] quota.c: In function 'quota_lookup_cbk': quota.c:618:23: error: assignment from incompatible pointer type [-Werror] quota.c:637:53: error: 'soft_lim' undeclared (first use in this function) quota.c:637:53: note: each undeclared identifier is reported only once for each function it appears in quota.c:608:28: error: unused variable 'size' [-Werror=unused-variable] cc1: all warnings being treated as errors ----------------------------------- Change-Id: I7a09e654a9cc064a423a5f8362f2a9c6abbc7edb Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/quota: minor fixes to enforcerv3.4quota1Raghavendra G2013-08-092-12/+5
| | | | | | | | | | * send size query to quotad only if limit is set on that inode. * don't check for loc->parent while querying size from quotad, since its a nameless lookup Change-Id: I10dc2f9d1e40875382040b53cb4ee5f6d9a27133 BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* features/quota: fixes to code reading limits from xattrs.Raghavendra G2013-08-092-44/+33
| | | | | | | | | | It was assumed that hard and soft limits are stored as two different xattrs on disk. However they are stored as two members of a structure which is stored as a value for a single key. Change-Id: I947fa5c375209c31fe1511bda0d5cb0e249af9ba BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd,cli: changes to quota list <path> ...Krutika Dhananjay2013-08-091-67/+11
| | | | | Change-Id: Ia37020c3aa11af6eed3af09cfe390b848b028b6a Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* glusterd: Clean up and fix glusterd_op_quota()Krutika Dhananjay2013-08-092-100/+70
| | | | | | | | | | | | | | | | | | | | ... and also fix cli logging In glusterd_op_quota(), * do not modify ret after going to 'out' as this causes the failure status (-1) to be overwritten, thereby causing the command to return 0 even on failure. * knock off additional labels like create_vol. * replace 'if' statements with 'switch case' statement. * delete only the 3 timeouts and the defaul-soft-limit, if present, from volinfo->dict, upon disablement of quota. Change-Id: If486a5373b66f2379d6d041a974d9b824fcb8518 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* glusterd: Changes to 'quota remove' subcommand behaviorKrutika Dhananjay2013-08-091-44/+39
| | | | | Change-Id: Ifdc60071146587dc5c60d9a53a92d49b3487fd82 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* glusterd: club limit-usage and soft-limit into a single commandKrutika Dhananjay2013-08-091-165/+83
| | | | | Change-Id: I5f680675576aeec584b497eb25dd804a9dd6d690 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* quota: Fix initialisation of privKrutika Dhananjay2013-08-081-0/+2
| | | | | | | Original-author: Vijay Bellur <vbellur@redhat.com> Change-Id: Iea21ef1cdfb78c79482ad02f81734516b7818714 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/quota: Allow the gluster 'special' processes to supercede the limitsVijay Bellur2013-08-081-4/+18
| | | | | | | | | Don't block the gluster internal processes like rebalance, gsyncd, self heal etc from the disk quotas and the xattrs setting. Solution: Allow all the clients with negative PID. Change-Id: Iaeaa8096e00d48b2a4c3f5df61d103da0b3d6598
* features/quota: Fix compilation warningsVijay Bellur2013-08-082-8/+24
| | | | Change-Id: Ie0b3af8b52f2d909c61094bdcaccfd724ff4ecc0
* features/quota: Introduce quota log helper function.Vijay Bellur2013-08-082-0/+55
| | | | Change-Id: I38077c7adc497b314f4037cbbb116458a26ed589
* glusterd, cli: Provide status of quotad in 'volume status'Krutika Dhananjay2013-08-085-19/+80
| | | | | | Change-Id: I5e90376ecfe11ae5a3bca936d9d9acdd54c337d7 BUG: 969461 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/quota: design changesRaghavendra G2013-08-0810-1384/+1741
| | | | | | | | | | | | | | * hard and soft limits are persisted in xattrs of the inode. Associating limits with inode instead of maintaining as a global list helps us to scale better. * quotad-aggregator acts as a special client to provide cluster view through an rpc program. Quota enforcer loaded on brick uses this to get aggregated directory sizes. Aggregated sizes are cached for a timeout period in in-memory inode contexts. Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I2ab508d9d4fe224bc1d8cf01cf2b7969dd4200bb BUG: 969461
* Merge "glusterd: Move timeout options and default-soft-limit to quota ↵Krishnan Parthasarathi2013-08-042-80/+6
|\ | | | | | | xlator" into quota-improvements
| * glusterd: Move timeout options and default-soft-limit to quota xlatorKrutika Dhananjay2013-07-312-80/+6
| | | | | | | | | | | | | | | | | | Write the 3 timeout options {soft-timeout, hard-timeout, alert-time} and default-soft-limit, if explicitly set, into brick volfiles. Change-Id: Ie3229a8ab1b081a5936defd4f977afc8a19dad50 BUG: 969461 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* | glusterd: Spawn quotad on every nodeKrutika Dhananjay2013-07-313-31/+46
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... and also trigger a reconfigure when the following quota sub-commands are executed: a. default-soft-limit, b. hard-timeout, c. soft-timeout, and d. alert-time. Also start/restart/stop quotad only when quota is enabled or disabled on any volume. Tests performed in a two node cluster: a. Create and start a volume, enable quota on it and check if quotad is spawned on both the nodes. b. Execute all quota sub-commands on the volume except 'enable' and 'disable' and verify that the pid of quota daemon doesn't change. c. Stop the volume and verify that quotad is stopped. d. Start it again. Quotad must be started now. e. Create, start and enable quota on a second volume, verify that the pid of quotad changes on both the nodes (indicating a restart). f. Disable quota on one of the volumes and verify that quotad's pid changes. g. Disable quota on the second volume too and verify that quotad is stopped on both the nodes. h. Enable quota again on one of the volumes, and verify that quotad is started on both the nodes. i. Add a new node into this cluster and verify that quotad is spawned on this node too. Change-Id: Ie93ab69c685051e196c377cff15078a1cde17fca BUG: 969461 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/quota: Improvements to quotaVarun Shastry2013-07-2919-1140/+2897
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Old implementation * Client side implementation of quota - Not secure - Increased traffic in updating the ctx New Implementation * 2 stages of quota implementation is done: Soft and hard quota Upon reaching soft quota limit on the directory it logs/alerts in the quota daemon log (ie DEFAULT_LOG_DIR/quotad.log) and no more writes allowed after hard quota limit. After reaching the soft-limit the daemon alerts the user/admin repeatively for every 'alert-time', which is configurable. * Quota is moved to server-side. There will be 2 quota xlators i. Quota Server It takes care of the enforcing the quota and maintains the context specific to the brick. Since this doesn't have the complete picture of the cluster, cluster wide usage is updated from the quota daemon. This updated context is saved and used for the enforcement. It updates its context by searching the QUOTA_UPDATE_KEY from the dict in the setxattr call, and is updated from nowhere else. The quota is always loaded in the server graph and is by passed if the feature is not enabled. Options specific to quota-server: server-quota - Specifies whether the features is on/off. It is used to by pass the quota if turned off. deem-statfs - If set to on, it takes quota limits into consideration while estimating fs size. (df command) ii. Quota Daemon This is the new xlator introduced with this patch. Its the *gluster client* process with no mount point, started upon enabling quota or restarting the volume. This is a single process for all the volumes in the cluster. Its volfile stored in GLUSTERD_DEFAULT_WORKI_DIR/quotad/quotad.vol. It queries for the sizes on all the bricks, aggregates the size and sends back the updated size, periodically. The timeout between successive updation is configurable and typically/by default more for below-soft-quota usage and less for above-soft-quota usage. It maintains the timeout inside the limit structure based on the usage; below soft limit and above soft limit. There will be thread running per volume which iterates through the list and decides whether the size to be queried in the current iteration based on its timeout. It takes the next iteration time taking the least of the timeouts in the list of entries. Maintains a separate inode table for each volume in the quotad. In the first iteration it builds the table for quota-dirs (dirs on which limit is set) and its components. Options specific to quotad: hard-timeout - Timeout for updation of usage to the quota-server when the usage is crosses the soft-limit. soft-timeout - Timeout for the updation of usage to the quota-server when the usage is below soft-limit. alert-time - Frequency of logging after the usage reached soft limit. Options common to both: default-soft-limit - This is used when individual paths are not configured with soft-limit and default value of this option is 90% of the hard-limit. limit-set - String containing all the limits. Thus in the current implementation we'll have 2 quota xlators: one in server graph and one in trusted client (quota daemon) of which the sole purpose will be to aggregate the quota size xattrs from all the bricks and send the same to server quota xlator. * Changes in glusterd and CLI A single volfile is created for all the volumes, similar to nfs volfile. All files related to quota client (volfile, pid etc) are stored in GLUSTERD_DEFAULT_WORK_DIR/quotad/. The new pattern of the quota limit stores in limit-set = <single-dir-limit>[,<single-dir-limit>] single-dir-limit = <abs-path>:<hard-limit>[:<soft-limit-in-percent>] It also introduces new options: volume quota <VOLNAME> {enable|disable|list [<path> ...]|remove <path>| default-soft-limit <percent>} | volume quota <VOLNAME> {limit-usage <path> <size> |soft-limit <path> <percent>} | volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>} Credit: Raghavendra Bhat <rabhat@redhat.com> Varun Shastry <vshastry@redhat.com> Shishir Gowda <sgowda@redhat.com> Kruthika Dhananjay <kdhananj@redhat.com> Brian Foster <bfoster@redhat.com> Krishnan Parthasarathi <kparthas@redhat.com> Change-Id: I16ec5be0c2faaf42b14034b9ccaf17796adef082 BUG: 969461 Signed-off-by: Varun Shastry <vshastry@redhat.com>
* features/changelog: fixes when enabling changelogVenky Shankar2013-07-252-29/+41
| | | | | | | | | | | | | Other enhancements being: * ignore fops made by rebalance * ignore internally triggered fops BUG: 987734 Change-Id: I7dd164ae3c209fdb8ec43a27e67b8846f937c93b Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/5380 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* write-behind: preserve error returned as-isAmar Tumballi2013-07-241-5/+1
| | | | | | | | | Change-Id: Ib766403774c1323e0bbddafedeaa47e7fa3a59fa Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 987415 Reviewed-on: http://review.gluster.org/5296 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: increase the auxillary group limit to 65536Anand Avati2013-07-243-7/+25
| | | | | | | | | | | | | Make the allocation of groups dynamic and increase the limit to 65536. Change-Id: I702364ff460e3a982e44ccbcb3e337cac9c2df51 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5111 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/marker: pass xdata in marker_unlink()Venky Shankar2013-07-242-1/+6
| | | | | | | | | | Change-Id: Ia310af96b25f29351f3adc4bbc97aea271df7673 BUG: 987747 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/5379 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: Fix conditional compiling for syncfsPranith Kumar K2013-07-241-1/+4
| | | | | | | | | Change-Id: Ief22e1c0f2b5074060752d70da41ae93f1028d62 BUG: 927146 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5381 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: mark linkfile creation/deletion as internal fopshishir gowda2013-07-241-9/+39
| | | | | | | | | | | | | | | | | | | Currently dht creates/deletes linkfiles for various ops like rename/linking and when layout changes. dht_linkfile_create already sends a key GLUSTERFS_INTERNAL_FOP_KEY in dict to identify this as an internal fop and not user based op. Enhancing rename related links/unlinks to send this key too. Marker/changelog or other xlators can now identify these as internal fops and handle them accordingly Change-Id: Ib1ca789e6dbce48703c55ad3f4f88f7cd2df3d06 BUG: 987428 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/5335 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: implement batched fsync in a single threadAnand Avati2013-07-235-1/+288
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because of the extra fsync()s issued by AFR transaction, they could potentially "clog" all the io-threads denying unrelated operations from making progress. This patch assigns a dedicated thread to issues fsyncs, as an experimental feature to understand performance characteristics with the approach. As a basis, incoming individual fsync requests are grouped into batches, falling in the same @batch-fsync-delay-usec window of time. These windows can extend in practice, as processing of the previous batch can take longer than @batch-fsync-delay-usec while new requests are getting batched. The feature support three modes (similar to the -S modes of fs_mark) - syncfs: In this mode one syncfs() is issued per batch, instead of N fsync()s (one per file.) - syncfs-single-fsync: In this mode one syncfs() is issued per batch (which, on Linux, guarantees the completion of write-out of dirty pages in the filesystem up to that point) and one single fsync() to synchronize or flush the controller/drive cache. This corresponds to -S 2 of fsmark. - syncfs-reverse-fsync: In this mode, one syncfs() is issued per batch, and all the open files in that batch are fsync()'ed in the reverse order of the queue. This corresponds to -S 4 of fsmark. - reverse-fsync: In this mode, no syncfs() is issued and all the files in the batch are fsync()'ed in the reverse order. This corresponds to -S 3 of fsmark. Change-Id: Ia1e170a810c780c8d80e02cf910accc4170c4cd4 BUG: 927146 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4746 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Handle parallel hardlinks self-healPranith Kumar K2013-07-231-0/+29
| | | | | | | | | Change-Id: Ieda11870c65edae500140b6c061f15a7b3f264f3 BUG: 986905 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5370 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* move 'xlators/marker/utils/' to 'geo-replication/' directoryAvra Sengupta2013-07-2219-3982/+1
| | | | | | | | | | Change-Id: Ibd0faefecc15b6713eda28bc96794ae58aff45aa BUG: 847839 Original Author: Amar Tumballi <amarts@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5133 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/changelog: changelog translatorAvra Sengupta2013-07-2228-14/+5020
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the initial version of the Changelog Translator. What is it ----------- Goal is to capture changes performed on a GlusterFS volume. The translator needs to be loaded on the server (bricks) and captures changes in a plain text file inside a configured directory path (controlled by "changelog-dir", should be somewhere in <export>/.glusterfs/changelog by default). Changes are classified into 3 types: - Data: : TYPE-I - Metadata : TYPE-II - Entry : TYPE-III Changelog file is rolled over after a certain time interval (defauls to 60 seconds) after which a changelog is started. The thing to be noted here is that for a time interval (time slice) multiple changes for an inode are recorded only once (ie. say for 100+ writes on an inode that happens within the time slice has only a single corresponding entry in the changelog file). That way we do not bloat up the changelog and also save lots of writes. Changelog Format ----------------- TYPE-I and TYPE-II changes have the gfid on the entity on which the operation happened. TYPE-III being a entry op requires the parent gfid and the basename. Changelog format has been kept to a minimal and it's upto the consumers to do the heavy loading of figuring out deletes, renames etc.. A single changelog file records all three types of changes, with each change starting with an identifier ("D": DATA, "M": METADATA and "E": ENTRY). Option is provided for the encoding type (See TUNABLES). Consumers ---------- The only consumer as of today would be geo-replication, although backup utilities, self-heal, bit-rot detection could be possible consumers in the future. CLI ---- By default, change-logging is disabled (the translator is present in the server graph but does nothing). When enabled (via cli) each brick starts to log the changes. There are a set of tunable that can be used to change the translators behaviour: - enable/disable changelog (disabled by default) gluster volume set <volume> changelog {on|off} - set the logging directory (<brick>/.glusterfs/changelogs is the default) gluster volume set <volume> changelog-dir /path/to/dir - select encoding type (binary (default) or ascii) gluster volume set <volume> encoding {binary|ascii} - change the rollover time for the logs (60 secs by default) gluster volume set <volume> rollover-time <secs> - when secs > 0, changelog file is not open()'d with O_SYNC flag - and fsync is trigerred periodically every <secs> seconds. gluster volume set <volume> fsync-interval <secs> features/changelog: changelog consumer library (libgfchangelog) A shared library is provided for the consumer of the changelogs for easy acess via APIs. Application can link against this library and request for changelog updates. Conversion of binary logs to human-readable ascii format is also taken care by the library which keeps a copy of the changelog in application provided working directory. Change-Id: I75575fb7f1c53d2bec3dba1a329ea7bb3c628497 BUG: 847839 Original Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5127 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* fuse: auxiliary gfid mount supportRaghavendra G2013-07-198-124/+1279
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * files can be accessed directly through their gfid and not just through their paths. For eg., if the gfid of a file is f3142503-c75e-45b1-b92a-463cf4c01f99, that file can be accessed using <gluster-mount>/.gfid/f3142503-c75e-45b1-b92a-463cf4c01f99 .gfid is a virtual directory used to seperate out the namespace for accessing files through gfid. This way, we do not conflict with filenames which can be qualified as uuids. * A new file/directory/symlink can be created with a pre-specified gfid. A setxattr done on parent directory with fuse_auxgfid_newfile_args_t initialized with appropriate fields as value to key "glusterfs.gfid.newfile" results in the entry <parent>/bname whose gfid is set to args.gfid. The contents of the structure should be in network byte order. struct auxfuse_symlink_in { char linkpath[]; /* linkpath is a null terminated string */ } __attribute__ ((__packed__)); struct auxfuse_mknod_in { unsigned int mode; unsigned int rdev; unsigned int umask; } __attribute__ ((__packed__)); struct auxfuse_mkdir_in { unsigned int mode; unsigned int umask; } __attribute__ ((__packed__)); typedef struct { unsigned int uid; unsigned int gid; char gfid[UUID_CANONICAL_FORM_LEN + 1]; /* a null terminated gfid string * in canonical form. */ unsigned int st_mode; char bname[]; /* bname is a null terminated string */ union { struct auxfuse_mkdir_in mkdir; struct auxfuse_mknod_in mknod; struct auxfuse_symlink_in symlink; } __attribute__ ((__packed__)) args; } __attribute__ ((__packed__)) fuse_auxgfid_newfile_args_t; An initial consumer of this feature would be geo-replication to create files on slave mount with same gfids as that on master. It will also help gsyncd to access files directly through their gfids. gsyncd in its newer version will be consuming a changelog (of master) containing operations on gfids and sync corresponding files to slave. * Also, bring in support to heal gfids with a specific value. fuse-bridge sends across a gfid during a lookup, which storage translators assign to an inode (file/directory etc) if there is no gfid associated it. This patch brings in support to specify that gfid value from an application, instead of relying on random gfid generated by fuse-bridge. gfids can be healed through setxattr interface. setxattr should be done on parent directory. The key used is "glusterfs.gfid.heal" and the value should be the following structure whose contents should be in network byte order. typedef struct { char gfid[UUID_CANONICAL_FORM_LEN + 1]; /* a null terminated gfid * string in canonical form */ char bname[]; /* a null terminated basename */ } __attribute__((__packed__)) fuse_auxgfid_heal_args_t; This feature can be used for upgrading older geo-rep setups where gfids of files are different on master and slave to newer setups where they should be same. One can delete gfids on slave using setxattr -x and .glusterfs and issue stat on all the files with gfids from master. Thanks to "Amar Tumballi" <amarts@redhat.com> and "Csaba Henk" <csaba@redhat.com> for their inputs. Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: Ie8ddc0fb3732732315c7ec49eab850c16d905e4e BUG: 952029 Reviewed-on: http://review.gluster.com/#/c/4702 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/4702 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mount/fuse: unlink the inode on revalidate if entry not foundBrian Foster2013-07-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | If an inode/dentry is linked via a client and removed via a separate client, the inode/dentry mapping in the initial client remains. A lookup of the removed name on the initial client typically returns ENOENT once the associated caches expire. If the initial client has multiple dentries linked to the same inode, however, lookups on the non-removed dentry create windows of time where lookups on the stale/removed name return successfully. This occurs because the stale mapping resolves to the still valid inode and tricks md-cache into returning valid lookup data. To correct this situation, unlink the stale inode mapping on a failed (ENOENT) revalidation lookup (i.e., when fuse has resolved the inode but a lookup returns ENOENT). Note that with this change, the state still occurs until an md-cache window has expired, allowed a lookup to pass through to the server and given the fuse translator an opportunity to clean up. Change-Id: I47dde2f11e2ef5b8dd51e9ac8be0f36cdb5081a3 BUG: 985074 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5337 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* locks: Added an xdata-based 'cmd' for inodelk count in a given domainshishir gowda2013-07-184-20/+67
| | | | | | | | | | | | | | | | | | | | Following is the semantics of the 'cmd': 1) If @domain is NULL - returns no. of locks blocked/granted in all domains 2) If @domain is non-NULL- returns no. of locks blocked/granted in that domain 3) If @domain is non-existent - returns '0'; This is important since locks xlator creates a domain in a lazy manner. where @domain - a string representing the domain. Change-Id: I5e609772343acc157ca650300618c1161efbe72d BUG: 951195 Original-author: Krishnan Parthasarathi <kparthas@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4889 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* dht: fix dht_discover_cbk doing a wrong layout set.shishir gowda2013-07-171-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with the sequence of operations are like below, we have issues with current code (MP == mountpoint): T0,MP1# mkdir /abcd (Succeeds on hash_subvol) T1,MP2# mkdir /abcd (Gets EEXIST as dir exists in hash_subvol) T2,MP2# mkdir /.gfid/<abcd's gfid>/xyz (lookup happens on abcd's gfid, calls dht_discover) T3,MP1# (Completes mkdir(), goes to dir_selfheal to set the layouts). T4,MP2# (dht_discover_cbk gets success for lookup as the entry existed, as layout is not yet written, it says normalize done, found holes). T5,MP2# (as layout anomaly is not considered an issue in this patch, dht_layout_set happens on inode, with all xlators pointing to 0s) T6,MP1# (completes mkdir call, inode has proper layouts) T7,MP2# mkdir /.gfid/<abcd's gfid>/xyz fails with ENOENT (with log saying no subvol found for hash value of xyz. Porting Amar's fix from down-stream beta branch. Change-Id: Ibdc37ee614c96158a1330af19cad81a39bee2651 BUG: 982913 Original-author: Amar Tumballi <amarts@redhat.com> Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/5302 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Prevent dht_access from going into a loop.shishir gowda2013-07-153-1/+40
| | | | | | | | | | | | | | | | | If access fails with ENOTCONN, do not wind to same subvol. We wind to first-up-subvol if access fails with ENOTCONN. In few cases, if dht has only 1 subvolume, and access fails with ENOTCONN, we go into a infinite loop of winding to same subvol The fix is to check if we previously wound to same subvol, and fail if first-up-subvol is same. Change-Id: Ib5d3ce7d33e8ea09147905a7df1ed280874fa549 BUG: 983431 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/5319 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: customize client-pid=-1 xtime aggregation to tolerate a replica downAvra Sengupta2013-07-151-2/+9
| | | | | | | | | | | | Using the new 'pluggable policies' API of libxlator. Change-Id: Ie7528182dff8fb42c6e8287a106d3057944df775 BUG: 847839 Original Author: Csaba Henk <csaba@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4904 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libxlator: implement pluggable aggregation policiesAvra Sengupta2013-07-155-47/+158
| | | | | | | | | | | | | | | | | The API is described in libxlator.h. Behavior remains the same for this commit; this is a preparatory step for per-translator customization of aggregation. Change-Id: I5d42923af59b2fd78e1ff59c12763875b57c5190 BUG: 847839 Original Author: Csaba Henk <csaba@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4903 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: node-uuid for directories winds to all subvolumesAvra Sengupta2013-07-151-2/+6
| | | | | | | | | | | | | | | this works similar to pathinfo now except that the request is sent to all subvolumes of dht. Underlying replica selects it's subvolume in a round-robin fashion till one of them returns successfully. Change-Id: Ie46c5f7090d04d8c2e487b209916ae6791e94624 BUG: 847839 Original Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5225 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/*: get logic to calculate min() of the 'stime' xattrAvra Sengupta2013-07-144-2/+120
| | | | | | | | | | | | | | | | | | | | | | * in both distribute and replicate (ignoring stripe for now), add logic to calculate the min() of stime values. * What is a 'stime' ? Why is this required: - stime means 'slave xtime', mainly used to keep track of slave node's sync status when distributed geo-replication is used. Logic of calculating 'min()' for this stime is very important as in case of crashes/reboots/shutdown, we will have to 'restart' with crawling from stime time value from the mount point, which gives the 'min()' of all the bricks, which means, we don't miss syncing any files in the above cases. Change-Id: I2be8d434326572be9d4986db665570a6181db1ee BUG: 847839 Original Author: Amar Tumballi <amarts@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4893 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mount/fuse: Provide option to use/not use kernel-readdirpPranith Kumar K2013-07-123-2/+19
| | | | | | | | | | | | | | By default fuse kernel readdirp usage in fuse xlator is off. When mount option use-readdirp=yes is provided it starts using fuse-kernel's readdirp. Change-Id: Id37edc53b1adc1638186d956c2f74c1e4e48aa59 BUG: 983477 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5322 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Unlink dst file after cleanup during migrationshishir gowda2013-07-121-17/+17
| | | | | | | | | | | | | | | | | If a rename happens during migration, unlink fails. This leads to stale xattrs, and Sticky bits still being set. By removing the xattrs and Sticky bits from dst (through fd ops), the stale link file would be cleaned up eventually (even if unlink fails on src). Change-Id: Iec537d021905438327a20e1d811aa06e74034364 BUG: 983399 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/5316 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: If linkfile unlink fails with ENOTCONN, do not failshishir gowda2013-07-111-1/+2
| | | | | | | | | | | | | | Currently if linkfile fails with ENOENT, we do not fail. We also need to treat failures with ENOTCONN as success, as if cached subvol is up, rm of a file should succeed. A stale linkfile will get removed later Change-Id: I71d136847933351ed9e2c939bda4a69bc96a3cfc BUG: 983416 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/5317 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Give up biglock before brick's rpc unrefKrishnan Parthasarathi2013-07-111-1/+5
| | | | | | | | | | | | | This is to prevent the possibility of a deadlock when rpc_connection_cleanup being called in the same thread as rpc_clnt_unref Change-Id: Ia4dcc0a8a6e6158d4ddec68b780fccbc4cd64adb BUG: 962619 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5321 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>