summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* features/quota: Improvements to quotaRaghavendra G2013-11-2625-1001/+3264
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Two stages of quota enforcement is done: Soft and hard quota Upon reaching soft quota limit on the directory it logs/alerts in the quota daemon log (ie DEFAULT_LOG_DIR/quotad.log) and no more writes allowed after hard quota limit. After reaching the soft-limit the daemon alerts the user/admin repeatively for every 'alert-time', which is configurable. * Quota enforcer is moved to server-side. It takes care of enforcing quota. Since enforcer doesn't have the cluster view, it relies on another service called quota-aggregator. Aggregator, on query can return the size of a directory based on the cluster view. Enforcer is always loaded in the server graph and is by passed if the feature is not enabled. Options specific to enforcer: server-quota - Specifies whether the feature is on/off. It is used to by pass the quota if turned off. deem-statfs - If set to on, it takes quota limits into consideration while estimating fs size. (df command). The algorithm followed is, i. Adjust statvfs based on limit configured on root. ii. If limit is set on the inode passed, use size/limits on that inode to populate statvfs. Otherwise, use size/limits configured on root. iii. Upon statvfs, update the ctx->size on the inode. iv. Don't let DHT aggregate, instead take the maximum of the usages from the subvols of the DHT, since each of it contains the complete information. Enforcer also makes use of gfid-to-path conversion functionality to work correctly when a client like nfs predominently relies on nameless lookups. * Quota Aggregator acts as a thin client to provide cluster view Its a lightweight *gluster client* process with no mount point, started upon enabling quota or restarting the volume. This is a single process run on each brick, which can answer queries on all volumes in the cluster. Its volfile stored in GLUSTERD_DEFAULT_WORKING_DIR/quotad/quotad.vol. Credits: Raghavendra Bhat <rabhat@redhat.com> Varun Shastry <vshastry@redhat.com> Shishir Gowda <sgowda@redhat.com> Kruthika Dhananjay <kdhananj@redhat.com> Brian Foster <bfoster@redhat.com> Krishnan Parthasarathi <kparthas@redhat.com> Change-Id: Id1cb25b414951da34c665a55f77385d482e0f9de BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/5952 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/marker: quota friendly changesRaghavendra G2013-11-267-90/+479
| | | | | | | | | | | | | | | | | | | | | | | | | * handles renames on dht linkfiles correctly * nameless lookup friendly changes. uses gfid-to-path conversion functionality from storage/posix to build ancestry till root. * log message cleanup. * build inode contexts in readdirp * Accounting still not correct with hardlinks. Credits: ======== Vijay Bellur <vbellur@redhat.com> Raghavendra Bhat <rabhat@redhat.com> Change-Id: I415b6fbbc9691f5a38d9fd3c5d083a61e578bb81 BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/5953 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* posix: placeholders for GFID to path conversionRaghavendra G2013-11-269-171/+1156
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | what? ===== The following is an attempt to generate the paths of a file when only its gfid is known. To find the path of a directory, the symlink handle to the directory maintained in the ".glusterfs" backend directory is read. The symlink handle is generated using the gfid of the directory. It (handle) contains the directory's name and parent gfid, which are used to recursively construct the absolute path as seen by the user from the mount point. A similar approach cannot be used for a regular file or a symbolic link since its hardlink handle, generated using its gfid, doesn't contain its parent gfid and basename. So xattrs are set to store the parent gfids and the number of hardlinks to a file or a symlink having the same parent gfid. When an user/application requests for the paths of a regular file or a symlink with multiple hardlinks, using the parent gfids stored in the xattrs, the paths of the parent directories are generated as mentioned earlier. The base names of the hardlinks (with the same parent gfid) are determined by matching the actual backend inode numbers of each entry in the parent directory with that of the hardlink handle. Xattr is set on a regular file, link, and symbolic link as follows, Xattr name : trusted.pgfid.<pargfidstr> Xattr value : <number of hardlinks to a regular file/symlink with the same parentgfid> If a regular file, hard link, symbolic link is created then an xattr in the above format is set in the backend. how to use? =========== This functionality can be used through getxattr interface. Two keys - glusterfs.ancestry.dentry and glusterfs.ancestry.path - enable usage of this functionality. A successful getxattr will have the result stored under same keys. Values will be, glusterfs.ancestry.dentry: -------------------------- A linked list of gf-dirent structures for all possible paths from root to this gfid. If there are multiple paths, the linked-list will be a series of paths one after another. Each path will be a series of dentries representing all components of the path. This key is primarily for internal usage within glusterfs. glusterfs.ancestry.path: ------------------------ A string containing all possible paths from root to this gfid. Multiple hardlinks of a file or a symlink are displayed as a colon seperated list (this could interfere with path components containing ':'). e.g. If there is a file "file1" in root directory with two hardlinks, "/dir2/link2tofile1" and "/dir1/link1tofile1", then [root@alpha gfsmntpt]# getfattr -n glusterfs.ancestry.path -e text file1 glusterfs.ancestry.path="/file1:/dir2/link2tofile1:/dir1/link1tofile1" Thanks Amar, Avati and Venky for the inputs. Original Author: Ramana Raja <rraja@redhat.com> BUG: 990028 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I0eaa9101e333e0c1f66ccefd9e95944dd4a27497 Reviewed-on: http://review.gluster.org/5951 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Provide HA for pathinfo getxattrPranith Kumar K2013-11-262-14/+81
| | | | | | | | | | | | | | | | | | | | | | | Problem: afr_[f]getxattr_pathinfo_cbks fail the fop even when it succeeded on one of the bricks. This can happen if the last response to pathinfo [f]getxattr is a failure. Fix: Remember if any of the [f]getxattr_pathinfos are successful and send that as the op_ret/op_errno value to the xlators above. Note: Winding fop to a client xlator that is not connected to server produces an error log. Preventing that by not even winding fop when client xlator is DOWN. Change-Id: I846e8c47423ffcfa2eabffe8924534781a36841a BUG: 1032927 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6332 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gNFS: More clean up for Gluster NFSSantosh Kumar Pradhan2013-11-256-76/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) Fix the typo in NFS default ACL The typo was introduced as part of the Fix to BZ 1009210 i.e. http://review.gluster.org/5980. The user ACL xattr structure was passed to default ACL xattr. 2) Clean up NFS code to avoid unnecessary SEGV in rpcsvc_drc_reconfigure() which was not validating the svc->drc. Add a routine rpcsvc_drc_deinit() to handle the clean up of DRC specific data structures. For init(), use rpcsvc_drc_init(). 3) nfs_init_state() was returning wrong value even if the registration with portmapper failed, causing the NFS server process to hang around. As a result it used to get SEGV during rpcsvc_drc_reconfigure(). 4) Clean up memfactor usage across nfs.c nfs3.c. Change-Id: I5cea26cb68dd8a822ec0ae104952f67fe63fa703 BUG: 1009210 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6329 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tests: add EXPECT_NOT macroRaghavendra G2013-11-241-1/+32
| | | | | | | | | | | | | | | | | We needed this macro while writing test cases for quota. With quota, a directory size is only guaranteed to be within some margin of quota limit, but not an accurate number. With not knowing what size to expect and EXPECT macro not complete enough to accept ranges of sizes, we can atleast write test-cases with EXPECT_NOT macro. After copying data to an empty file, it will be guaranteed the size will not be zero. This is good enough for quota test cases. Change-Id: I722ebd68044716a5eeaf0bd7e9aae61df8469017 BUG: 1022995 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6253 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* group-virt: To minimize 'split-brain' enable quorum under virt profileHarshavardhana2013-11-221-0/+2
| | | | | | | | | | | | | | | | | | Quorum as default is necessary when storing virtual machine images. It would be necessary to enable both server and client quorum Currently defaulted values are: ---------------- server-quorum-type=server quorum-type=auto ---------------- Change-Id: Ic2adb5856ce3c2589476e872e988cae6eeb9b25e BUG: 1032080 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/6340 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* fuse: revalidate group id cache on uid/gid change detectionAnand Avati2013-11-214-4/+25
| | | | | | | | | | | | | | | | | | | - Remember the uid and gid of the pid at the time of caching the group id list. - Next time when referring to the cache confirm that uid and gid of that pid has not changed since. If it has, treat it like a timeout/cache miss. - Solves group id caching issue caused when Samba runs on gluster FUSE mount and changes the uid/gid on a per syscall basis. Change-Id: I3382b037ff0b6d5eaaa36d9c898232543475aeda BUG: 1032438 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6320 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* syncops: expose @flags in syncop_rmdir()Anand Avati2013-11-214-5/+5
| | | | | | | | | Change-Id: I9b73c1db728e4cb3948fc118cceb292b21d48b96 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6112 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cli: fix possible memory leaksBala.FA2013-11-212-1/+6
| | | | | | | | | BUG: 955548 Change-Id: Iae410712e7e6d7a76cd537c77f1919e3b4cdf6bb Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6328 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Add description for git based installations.Vijay Bellur2013-11-211-2/+4
| | | | | | | | Change-Id: I60e445539f255b3220f885bd790f800e4c1ea55a Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6333 Reviewed-by: Lalatendu Mohanty <lmohanty@redhat.com> Tested-by: Lalatendu Mohanty <lmohanty@redhat.com>
* libglusterfs: use correct check for linux falloc.h availabilityBrian Foster2013-11-201-2/+11
| | | | | | | | | | | | We should check for HAVE_LINUX_FALLOC_H rather than HAVE_FALLOC_H to determine whether to include linux/falloc.h. Change-Id: I05eca4de2893a88d6b9cc5ebfce738708b9960d4 BUG: 1032378 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/6314 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* Build storage/posix xlator if fallocate() does not existsEmmanuel Dreyfus2013-11-201-1/+12
| | | | | | | | | | | If fallocate() does not exists, just return EOPNOTSUPP BUG: 764655 Change-Id: I808114f733c88985519dc47fb7537e1ced1db077 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6289 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: Add Zerofill FOP supportM. Mohan Kumar2013-11-205-3/+295
| | | | | | | | | | BUG: 1028673 Change-Id: I9ba8e3e6cf2f888640b4d2a2eb934a27ff903c42 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6290 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mount/fuse: handle --gid-timeout=0 properlyAnand Avati2013-11-202-2/+5
| | | | | | | | | | | | | | Fix the bug which was using the timeout value as a flag to indicate if it was set (and hence would fail when timeout=0 would evaluate as False) Change-Id: Ie9a8f28d35603458cdac26c9a4e0343e7eda7344 BUG: 1032438 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6308 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gNFS: Coverity fix for CID 1128906Santosh Kumar Pradhan2013-11-201-11/+13
| | | | | | | | | | | | | Fix the Coverity issue introduced in RFE: NFS volume set/reset commit i.e. http://review.gluster.org/6236 Change-Id: I817b9da03a3ce7f5511303faea0c50dfdad60ff4 BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6307 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Start rebalance only where requiredKaushal M2013-11-203-1/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | Gluster was starting rebalance processes on peers where it wasn't required in two cases. - For a normal rebalance command on a volume, rebalance processes were started on all peers instead of just the peers which contain bricks of the volume - For rebalance process being restarted by a volume sync, caused by a new peer being probed or a peer restarting, rebalance processes were started on all peers, for both a normal rebalance and for remove-brick needing rebalance. This patch adds a new check before starting rebalance process in the above two cases. - For rebalance process required by a rebalance command, each peer will check if it contains atleast one brick of the volume - For rebalance process required by a remove-brick command, each peer will check if it contains atleast one of the bricks being removed Change-Id: I512da16994f0d5482889c3a009c46dc20a8a15bb BUG: 1031887 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6301 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: List only nodes which have rebalance started in rebalance statusKaushal M2013-11-202-214/+137
| | | | | | | | | | | | | | | | Listing the nodes on which rebalance hasn't been started is just giving out extraneous information. Also, refactor the rebalance status printing code into a single function and use it for both rebalance and remove-brick status. BUG: 1031887 Change-Id: I47bd561347dfd6ef76c52a1587916d6a71eac369 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6300 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd: fix undefined sybmol error related to BDPranith Kumar K2013-11-193-1/+5
| | | | | | | | | | Change-Id: I2210f1ac7de04c6025c0ec02d998b626d41466ae BUG: 1028672 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6303 Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Do not build fallocate FUSE FOP if the system call does not existEmmanuel Dreyfus2013-11-191-0/+4
| | | | | | | | | BUG: 764655 Change-Id: Ica310e75bee16741b837e658981238c1b99c254f Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6288 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Python build flag detectionEmmanuel Dreyfus2013-11-191-0/+15
| | | | | | | | | | | Ask python-config for proper python build flags BUG: 764655 Change-Id: I7aede0f93637c61dbafc43580bff46af60f0f0d3 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6283 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* NetBSD missing backtrace(3) portability fixEmmanuel Dreyfus2013-11-193-1/+60
| | | | | | | | | | | | Implement backtrace(3) and backtrace_symbols(3) which do not exist in NetBSD While there, remove duplicate #include <stdio.h> BUG: 764655 Change-Id: Iccd695765906e085c3f8fcb670506d4fea68fa39 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6285 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Search gettext() in -lintlEmmanuel Dreyfus2013-11-191-0/+1
| | | | | | | | | | | | If gettext() is not found in libc, look it up in libintl (this is where NetBSD has it) BUG: 764655 Change-Id: Ifba8681b8603ead5d0b8587b71457250982077e1 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6287 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* autogen.sh portability fixesEmmanuel Dreyfus2013-11-191-2/+11
| | | | | | | | | | | | - Do not assume tar has --version, as BSD tar does not - Allow specifying python binary through PYTHONBIN in case it is e.g. python2.7 BUG: 764655 Change-Id: I71f0f4830e10915782775de811c92db8e6ab4c55 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6281 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Fix xml compilation errorM. Mohan Kumar2013-11-191-1/+6
| | | | | | | | | | | | | | | Compiling GlusterFS without xml package results in following build error cli-rpc-ops.o: In function `gf_cli_status_cbk': /home/mohan/Work/glusterfs/cli/src/cli-rpc-ops.c:6430: undefined reference to `cli_xml_output_vol_status_tasks_detail' Change-Id: I49b3c46ac3340c40e372bef4690cedb41df20e8a Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6295 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* BD fixes for coverity scanM. Mohan Kumar2013-11-191-5/+9
| | | | | | | | | BUG: 1028672 Change-Id: I2e7889fb113cedd2d5928b210149d3fd7b8b22ab Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6292 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Fixes for ZF reported by coverityM. Mohan Kumar2013-11-193-3/+11
| | | | | | | | | BUG: 1028673 Change-Id: I7c75738cca22c81c5629d579ef5bea24000e622e Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6291 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Have #include <signal.h> for kill(2)Emmanuel Dreyfus2013-11-182-0/+2
| | | | | | | | | BUG: 764655 Change-Id: I4d18c9a6c00cb4696645fcb437398562f00b9d24 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6284 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* NetBSD missing loff_t portability fixEmmanuel Dreyfus2013-11-171-0/+4
| | | | | | | | | | | define loff_t as off_t, is is already long long anyway. BUG: 764655 Change-Id: I99edda9b804475a8696c2d32ccf8eae152851e21 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6286 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli: add peerid to volume status xml outputBala.FA2013-11-143-0/+46
| | | | | | | | | | | | This patch adds <peerid> tag to bricks and nfs/shd like services to volume status xml output. BUG: 955548 Change-Id: I9aaa9266e4d56f632235eaeef565e92d757c0694 Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6162 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* zerofill: Change the type of len argument of glfs_zerofill() to off_tBharata B Rao2013-11-1424-36/+36
| | | | | | | | | | | | | | glfs_zerofill() can be potentially called to zero-out entire file and hence allow for bigger value of length parameter. Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63 BUG: 1028673 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-on: http://review.gluster.org/6266 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Anand Avati <avati@redhat.com>
* fuse: Check the return status from state->resolve_nowv3.5.0qa1Vijaykumar M2013-11-142-7/+49
| | | | | | | | | Change-Id: I85fc6dd393449d365bb908b38c2827b58cb08171 BUG: 1030208 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/6262 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gNFS: RFE for NFS connection behaviorSantosh Kumar Pradhan2013-11-1424-159/+1041
| | | | | | | | | | | | | | | Implement reconfigure() for NFS xlator so that volume set/reset wont restart the NFS server process. But few options can not be reconfigured dynamically e.g. nfs.mem-factor, nfs.port etc which needs NFS to be restarted. Change-Id: Ic586fd55b7933c0a3175708d8c41ed0475d74a1c BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6236 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gfapi: Closed the logfile fd in glfs_finiPoornima2013-11-141-0/+3
| | | | | | | | | | | | | | | | The logfile fd is not closed even after calling glfs_fini, hence in smb mount if connection to glusterfs volume fails at a point after the log file was opened, the fd would remain open until the process dies. This patch closes the logfile fd in glfs_fini. Change-Id: I608bfac9c6833b42750b0383ad26fd33ee378ee1 BUG: 1030228 Signed-off-by: Poornima <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/6263 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Transparent data encryption and metadata authenticationEdward Shishkin2013-11-1315-12/+8408
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .. in the systems with non-trusted server This new functionality can be useful in various cloud technologies. It is implemented via a special encryption/crypt translator,which works on the client side and performs encryption and authentication; 1. Class of supported algorithms The crypt translator can support any atomic symmetric block cipher algorithms (which require to pad plain/cipher text before performing encryption/decryption transform (see glossary in atom.c for definitions). In particular, it can support algorithms with the EOF issue (which require to pad the end of file by extra-data). Crypt translator performs translations user -> (offset, size) -> (aligned-offset, padded-size) ->server (and backward), and resolves individual FOPs (write(), truncate(), etc) to read-modify-write sequences. A volume can contain files encrypted by different algorithms of the mentioned class. To change some option value just reconfigure the volume. Currently only one algorithm is supported: AES_XTS. Example of algorithms, which can not be supported by the crypt translator: 1. Asymmetric block cipher algorithms, which inflate data, e.g. RSA; 2. Symmetric block cipher algorithms with inline MACs for data authentication. 2. Implementation notes. a) Atomic algorithms Since any process in a stackable file system manipulates with local data (which can be obsoleted by local data of another process), any atomic cipher algorithm without proper support can lead to non-POSIX behavior. To resolve the "collisions" we introduce locks: before performing FOP->read(), FOP->write(), etc. the process should first lock the file. b) Algorithms with EOF issue Such algorithms require to pad the end of file with some extra-data. Without proper support this will result in losing information about real file size. Keeping a track of real file size is a responsibility of the crypt translator. A special extended attribute with the name "trusted.glusterfs.crypt.att.size" is used for this purpose. All files contained in bricks of encrypted volume do have "padded" sizes. 3. Non-trusted servers and Metadata authentication We assume that server, where user's data is stored on is non-trusted. It means that the server can be subjected to various attacks directed to reveal user's encrypted personal data. We provide protection against such attacks. Every encrypted file has specific private attributes (cipher algorithm id, atom size, etc), which are packed to a string (so-called "format string") and stored as a special extended attribute with the name "trusted.glusterfs.crypt.att.cfmt". We protect the string from tampering. This protection is mandatory, hardcoded and is always on. Without such protection various attacks (based on extending the scope of per-file secret keys) are possible. Our authentication method has been developed in tight collaboration with Red Hat security team and is implemented as "metadata loader of version 1" (see file metadata.c). This method is NIST-compliant and is based on checking 8-byte per-hardlink MACs created(updated) by FOP->create(), FOP->link(), FOP->unlink(), FOP->rename() by the following unique entities: . file (hardlink) name; . verified file's object id (gfid). Every time, before manipulating with a file, we check it's MACs at FOP->open() time. Some FOPs don't require a file to be opened (e.g. FOP->truncate()). In such cases the crypt translator opens the file mandatory. 4. Generating keys Unique per-file keys are derived by NIST-compliant methods from the a) parent key; b) unique verified object-id of the file (gfid); Per-volume master key, provided by user at mount time is in the root of this "tree of keys". Those keys are used to: 1) encrypt/decrypt file data; 2) encrypt/decrypt file metadata; 3) create per-file and per-link MACs for metadata authentication. 5. Instructions Getting started with crypt translator Example: 1) Create a volume "myvol" and enable encryption: # gluster volume create myvol pepelac:/vols/xvol # gluster volume set myvol encryption on 2) Set location (absolute pathname) of your master key: # gluster volume set myvol encryption.master-key /home/me/mykey 3) Set other options to override default options, if needed. Start the volume. 4) On the client side make sure that the file /home/me/mykey exists and contains proper per-volume master key (that is 256-bit AES key). This key has to be in hex form, i.e. should be represented by 64 symbols from the set {'0', ..., '9', 'a', ..., 'f'}. The key should start at the beginning of the file. All symbols at offsets >= 64 are ignored. 5) Mount the volume "myvol" on the client side: # glusterfs --volfile-server=pepelac --volfile-id=myvol /mnt After successful mount the file which contains master key may be removed. NOTE: Keeping the master key between mount sessions is in user's competence. ********************************************************************** WARNING! Losing the master key will make content of all regular files inaccessible. Mount with improper master key allows to access content of directories: file names are not encrypted. ********************************************************************** 6. Options of crypt translator 1) "master-key": specifies location (absolute pathname) of the file which contains per-volume master key. There is no default location for master key. 2) "data-key-size": specifies size of per-file key for data encryption Possible values: . "256" default value . "512" 3) "block-size": specifies atom size. Possible values: . "512" . "1024" . "2048" . "4096" default value; 7. Test cases Any workload, which involves the following file operations: ->create(); ->open(); ->readv(); ->writev(); ->truncate(); ->ftruncate(); ->link(); ->unlink(); ->rename(); ->readdirp(). 8. TODOs: 1) Currently size of IOs issued by crypt translator is restricted by block_size (4K by default). We can use larger IOs to improve performance. Change-Id: I2601fe95c5c4dc5b22308a53d0cbdc071d5e5cee BUG: 1030058 Signed-off-by: Edward Shishkin <edward@redhat.com> Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4667 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht - rebalance: handle the rebalance @ inode level (!fd level)Amar Tumballi2013-11-134-128/+148
| | | | | | | | | | | | | * migrate all the fd's on an inode to newer subvol after rebalance * use the migration in progress flag in inode, so all the operations on the inode can make use of it Change-Id: Ib807a46e927a1062688fc15119c916797c52a350 BUG: 1013456 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/5891 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* libglusterfs/inode: introduce new APIs for ctx handlingAmar Tumballi2013-11-133-21/+265
| | | | | | | | | | | | | | * inode_ctx_reset{0,1,2}() for reseting value1, value2, and both respectively * inode_ctx_get0() - to get the first value only * inode_ctx_set0() - to set the first value only * inode_ctx_get1() - to get the second value only * inode_ctx_set1() - to set the second value only Change-Id: I4dfbdac81d6a3f4e5784e060c76edabb1692ce03 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/5890 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* bd: Add test case for bd xlatorM. Mohan Kumar2013-11-131-0/+131
| | | | | | | | | Change-Id: I73a0bfa7085d2e71b2489687fa53f5fe7d1e8ea1 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6050 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: Add support to create clone, snapshot and merge of LV images.M. Mohan Kumar2013-11-137-56/+654
| | | | | | | | | | | | | | | | | | | | | | | | | | Special xattr names "clone" & "snapshot" can be used to create full and linked clone of the LV images. GFID of destination posix file (to be mapped) is passed as a value to the xattr. Destination posix file must exist before running this operation. These operations form a basis for offloading storage related operations from QEMU to GlusterFS. Syntax for full clone: xattr name: "clone" value: "gfid-of-dest-file" Syntax for linked clone: xattr name: "snapshot" value: "gfid-of-dest-file" Syntax for merging: xattr name: "merge" value: "path-to-snapshot-file" Example: setfattr -n clone -v <gfid-of-dest-file> /media/source setfattr -n snapshot -v <gfid-of-dest-file> /media/source setfattr -n merge -v "/media/sn" /media/sn Change-Id: Id9f984a709d4c2e52a64ae75bb12a8ecb01f8776 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5626 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: Add aio support to BD xlatorM. Mohan Kumar2013-11-137-24/+651
| | | | | | | | | | | | Volume option bd-aio controls AIO feature for BD xlator. Code taken from posix-aio.c Change-Id: Ib049bd59c9d3f9101d33939838322cfa808de053 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5748 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: Add BD support to other xlatorsM. Mohan Kumar2013-11-133-26/+67
| | | | | | | | | | | | | | | | Make changes to distributed xlator to work with BD xlator. Unlike files, a block device can't be removed when its opened. So some part of the code were moved down to avoid this situation. Also before truncating a BD file its BD_XATTR should be set otherwise truncate will result in truncating posix file. So file is created with needed BD_XATTR and truncate is invoked. Also enables BD xlator in stripe volume type. Change-Id: If127516e261fac5fc5b137e7fe33e100bc92acc0 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5235 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: posix/multi-brick support to BD xlatorM. Mohan Kumar2013-11-1321-3/+3289
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current BD xlator (block backend) has a few limitations such as * Creation of directories not supported * Supports only single brick * Does not use extended attributes (and client gfid) like posix xlator * Creation of special files (symbolic links, device nodes etc) not supported Basic limitation of not allowing directory creation is blocking oVirt/VDSM to consume BD xlator as part of Gluster domain since VDSM creates multi-level directories when GlusterFS is used as storage backend for storing VM images. To overcome these limitations a new BD xlator with following improvements is suggested. * New hybrid BD xlator that handles both regular files and block device files * The volume will have both POSIX and BD bricks. Regular files are created on POSIX bricks, block devices are created on the BD brick (VG) * BD xlator leverages exiting POSIX xlator for most POSIX calls and hence sits above the POSIX xlator * Block device file is differentiated from regular file by an extended attribute * The xattr 'user.glusterfs.bd' (BD_XATTR) plays a role in mapping a posix file to Logical Volume (LV). * When a client sends a request to set BD_XATTR on a posix file, a new LV is created and mapped to posix file. So every block device will have a representative file in POSIX brick with 'user.glusterfs.bd' (BD_XATTR) set. * Here after all operations on this file results in LV related operations. For example opening a file that has BD_XATTR set results in opening the LV block device, reading results in reading the corresponding LV block device. When BD xlator gets request to set BD_XATTR via setxattr call, it creates a LV and information about this LV is placed in the xattr of the posix file. xattr "user.glusterfs.bd" used to identify that posix file is mapped to BD. Usage: Server side: [root@host1 ~]# gluster volume create bdvol host1:/storage/vg1_info?vg1 host2:/storage/vg2_info?vg2 It creates a distributed gluster volume 'bdvol' with Volume Group vg1 using posix brick /storage/vg1_info in host1 and Volume Group vg2 using /storage/vg2_info in host2. [root@host1 ~]# gluster volume start bdvol Client side: [root@node ~]# mount -t glusterfs host1:/bdvol /media [root@node ~]# touch /media/posix It creates regular posix file 'posix' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# mkdir /media/image [root@node ~]# touch /media/image/lv1 It also creates regular posix file 'lv1' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# setfattr -n "user.glusterfs.bd" -v "lv" /media/image/lv1 [root@node ~]# Above setxattr results in creating a new LV in corresponding brick's VG and it sets 'user.glusterfs.bd' with value 'lv:<default-extent-size' [root@node ~]# truncate -s5G /media/image/lv1 It results in resizig LV 'lv1'to 5G New BD xlator code is placed in xlators/storage/bd directory. Also add volume-uuid to the VG so that same VG can't be used for other bricks/volumes. After deleting a gluster volume, one has to manually remove the associated tag using vgchange <vg-name> --deltag <trusted.glusterfs.volume-id:<volume-id>> Changes from previous version V5: * Removed support for delayed deleting of LVs Changes from previous version V4: * Consolidated the patches * Removed usage of BD_XATTR_SIZE and consolidated it in BD_XATTR. Changes from previous version V3: * Added support in FUSE to support full/linked clone * Added support to merge snapshots and provide information about origin * bd_map xlator removed * iatt structure used in inode_ctx. iatt is cached and updated during fsync/flush * aio support * Type and capabilities of volume are exported through getxattr Changes from version 2: * Used inode_context for caching BD size and to check if loc/fd is BD or not. * Added GlusterFS server offloaded copy and snapshot through setfattr FOP. As part of this libgfapi is modified. * BD xlator supports stripe * During unlinking if a LV file is already opened, its added to delete list and bd_del_thread tries to delete from this list when a last reference to that file is closed. Changes from previous version: * gfid is used as name of LV * ? is used to specify VG name for creating BD volume in volume create, add-brick. gluster volume create volname host:/path?vg * open-behind issue is fixed * A replicate brick can be added dynamically and LVs from source brick are replicated to destination brick * A distribute brick can be added dynamically and rebalance operation distributes existing LVs/files to the new brick * Thin provisioning support added. * bd_map xlator support retained * setfattr -n user.glusterfs.bd -v "lv" creates a regular LV and setfattr -n user.glusterfs.bd -v "thin" creates thin LV * Capability and backend information added to gluster volume info (and --xml) so that management tools can exploit BD xlator. * tracing support for bd xlator added TODO: * Add support to display snapshots for a given LV * Display posix filename for list-origin instead of gfid Change-Id: I00d32dfbab3b7c806e0841515c86c3aa519332f2 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/4809 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd_map: Remove bd_map xlatorM. Mohan Kumar2013-11-1335-4602/+20
| | | | | | | | | | | Remove bd_map xlator and CLI related changes. Change-Id: If7086205df1907127c1a1fa4ba603f1c48421d09 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5747 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gfapi: introduce glfs_readdir() and glfs_readdirplus() APIsAnand Avati2013-11-135-2/+96
| | | | | | | | | Change-Id: I6b233bf647585675f233898351bf593f251716cc BUG: 839950 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6201 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* server/rpc: bricks goes offline and comes back online, with lots of "No such ↵Vikhyat Umrao2013-11-121-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | file or directory" log messages Problem: Messages were getting logged very frequently at log level INFO. [2013-03-01 11:34:28.029222] I [server3_1-fops.c:1541:server_open_cbk] vol-server: 993888: OPEN (null) (--) ==> -1 (No such file or directory) [2013-03-01 11:34:28.031579] I [server3_1-fops.c:252:server_inodelk_cbk] vol-server: 993896: INODELK (null) (--) ==> -1 (No such file or directory) [2013-03-01 11:34:28.034041] I [server3_1-fops.c:252:server_inodelk_cbk] vol-server: 993914: INODELK (null) (--) ==> -1 (No such file or directory) [2013-03-01 11:34:28.040435] I [server3_1-fops.c:1338:server_flush_cbk] vol-server: 993938: FLUSH -2 (--) ==> -1 (No such file or directory) Solution: Moved them to DEBUG log level if error number equlas to ENOENT else to ERROR log level. It will help in decreasing the size of log files at INFO log level. For server_open_cbk and for some other functions we already have a patch, below is the URL for it. URL- http://review.gluster.org/6241 This patch solves logging problem for functions server_inodelk_cbk and server_flush_cbk. Change-Id: I57372e851371e466f1674726015e28378b826f5f BUG: 1029372 Signed-off-by: Vikhyat Umrao<vumrao@redhat.com> Reviewed-on: http://review.gluster.org/6252 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* zerofill: Update API versionM. Mohan Kumar2013-11-121-1/+1
| | | | | | | | | | | | version 6 adds zerofill FOP BUG: 1028673 Change-Id: I27cfc48cd6f7f0f6daf94e1c9cfbe420a0d090af Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6255 Reviewed-by: Bharata B Rao <bharata.rao@gmail.com> Tested-by: Bharata B Rao <bharata.rao@gmail.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/compress: Compression/DeCompression translatorPrashanth Pai2013-11-1111-1/+1256
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * When a writev call occurs, the client compresses the data before sending it to server. On the server, compressed data is decompressed. Similarly, when a readv call occurs, the server compresses the data before sending it to client. On the client, the compressed data is decompressed. Thus the amount of data sent over the wire is minimized. * Compression/Decompression is done using Zlib library. * During normal operation, this is the format of data sent over wire : <compressed-data> + trailer(8) The trailer contains the CRC32 checksum and length of original uncompressed data. This is used for validation. HOW TO USE ---------- Turning on compression xlator: gluster volume set <vol_name> compress on Configurable options: gluster volume set <vol_name> compress.compression-level 8 gluster volume set <vol_name> compress.min-size 50 Change-Id: Ib7a66b6f1f70fe002b7c513588cdf75c69370805 BUG: 923540 Original-author : Venky Shankar <vshankar@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Prashanth Pai <nullpai@gmail.com> Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: http://review.gluster.org/3251 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli: Set the o/p width of hostname to 8 charactersVijaykumar M2013-11-111-1/+1
| | | | | | | | | Change-Id: I91dcb19ba4d31c17e6041155c0e59af457b87f1b BUG: 1028871 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/6245 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* server/rpc: Numerous entries of error - "No such file or directory" in ↵Vikhyat Umrao2013-11-111-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bricks log Problem: Messages were getting logged very frequently at log level INFO. One of the log file snippet - [2013-10-27 00:05:01.501355] I [server3_1-fops.c:1707:server_stat_cbk] 0-vol-server: 24846575: STAT (null) (--) ==> -1 (No such file or directory) [2013-10-27 00:05:01.505101] I [server3_1-fops.c:1707:server_stat_cbk] 0-vol-server: 24846577: STAT (null) (--) ==> -1 (No such file or directory) [2013-10-27 00:05:01.507299] I [server3_1-fops.c:1707:server_stat_cbk] 0-vol-server: 24846578: STAT (null) (--) ==> -1 (No such file or directory) [2013-10-20 19:50:35.554563] I [server3_1-fops.c:1538:server_open_cbk] 0-vol-server: 18714687: OPEN <gfid:01c70ca0-1952-4e82-abee-a07205757d8e> (01c70ca0-1952-4e82-abee-a07205757d8e) ==> -1 (No such file or directory) [2013-10-20 19:50:35.555520] I [server3_1-fops.c:1538:server_open_cbk] 0-vol-server: 18714697: OPEN <gfid:01c70ca0-1952-4e82-abee-a07205757d8e> (01c70ca0-1952-4e82-abee-a07205757d8e) ==> -1 (No such file or directory) [2013-10-20 19:50:35.558292] I [server3_1-fops.c:1538:server_open_cbk] 0-vol-server: 18714712: OPEN (null) (--) ==> -1 (No such file or directory) Solution: Moved them to DEBUG log level if error number equlas to ENOENT else to ERROR log level. It will help in decreasing the size of log files at INFO log level. Change-Id: I23d74320c9c21bbce4805c20295556cc2cc0a8d8 BUG: 808073 Signed-off-by: Vikhyat Umrao <vumrao@redhat.com> Reviewed-on: http://review.gluster.org/6241 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli: write 'volume rebalance' error message in xml format whenDawit Alemu2013-11-101-4/+12
| | | | | | | | | | | | | | | | | | | --xml is specified When 'volume rebalance' encounters an error the cli prints the error message in plain text independent of whether --xml is specified. This throws off client application that expect xml output (as mentioned in bz1026143). Now, if the --xml flag is supplied, the cli print 'volume rebalance' error messages in xml format. Change-Id: I16c6a7a4cdd2819eb73422ab849125986dc299a6 BUG: 1026143 Signed-off-by: Dawit Alemu <dalemu@redhat.com> Reviewed-on: http://review.gluster.org/6242 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>