glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	cluster/afr: Fix data loss due to race between sh and ongoing write	Krutika Dhananjay	2015-12-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When IO is happening on a file and a brick goes down comes back up during this time, protocol/client translator attempts reopening of the fd on the gfid handle of the file. But if another client renames this file while a brick was down && writes were in progress on it, once this brick is back up, there can be a race between reopening of the fd and entry self-heal replaying the effect of the rename() on the sink brick. If the reopening of the fd happens first, the application's writes continue to go into the data blocks associated with the gfid. Now entry-self-heal deletes 'src' and creates 'dst' file on the sink, marking dst as a 'newentry'. Data self-heal is also completed on 'dst' as a result and self-heal terminates. If at this point the application is still writing to this fd, all writes on the file after self-heal would go into the data blocks associated with this fd, which would be lost once the fd is closed. The result - the 'dst' file on the source and sink are not the same and there is no pending heal on the file, leading to silent corruption on the sink. Fix: Leverage http://review.gluster.org/#/c/12816/ to ensure the gfid handle path gets saved in .glusterfs/unlink until the fd is closed on the file. During this time, when self-heal sends mknod() with gfid of the file, do the following: link() the gfid handle under .glusterfs/unlink to the new path to be created in mknod() and rename() the gfid handle to go back under .glusterfs/ab/cd/. Change-Id: I86ef1f97a76ffe11f32653bb995f575f7648f798 BUG: 1292379 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/13001 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	posix: posix_make_ancestryfromgfid shouldn't log ENOENT	vmallika	2015-08-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	posix_make_ancestryfromgfid shouldn't log ENOENT and it should set proper op_errno Change-Id: I8a87f30bc04d33cab06c91c74baa9563a1c7b45d BUG: 1251449 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11861 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	Porting new log messages for posix	Hari Gowtham	2015-06-17	1	-23/+30
\| \| \| \| \| \| \| \| \| \|	Change-Id: I29bdeefb755805858e3cb1817b679cb6f9a476a9 BUG: 1194640 Signed-off-by: Hari Gowtham <hgowtham@dhcp35-85.lab.eng.blr.redhat.com> Reviewed-on: http://review.gluster.org/9893 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	build: do not #include "config.h" in each file	Niels de Vos	2015-05-29	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of including config.h in each file, and have the additional config.h included from the compiler commandline (-include option). When a .c file tests for a certain #define, and config.h was not included, incorrect assumtions were made. With this change, it can not happen again. BUG: 1222319 Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	Avoid conflict between contrib/uuid and system uuid	Emmanuel Dreyfus	2015-04-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	glusterfs relies on Linux uuid implementation, which API is incompatible with most other systems's uuid. As a result, libglusterfs has to embed contrib/uuid, which is the Linux implementation, on non Linux systems. This implementation is incompatible with systtem's built in, but the symbols have the same names. Usually this is not a problem because when we link with -lglusterfs, libc's symbols are trumped. However there is a problem when a program not linked with -lglusterfs will dlopen() glusterfs component. In such a case, libc's uuid implementation is already loaded in the calling program, and it will be used instead of libglusterfs's implementation, causing crashes. A possible workaround is to use pre-load libglusterfs in the calling program (using LD_PRELOAD on NetBSD for instance), but such a mechanism is not portable, nor is it flexible. A much better approach is to rename libglusterfs's uuid_* functions to gf_uuid_* to avoid any possible conflict. This is what this change attempts. BUG: 1206587 Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10017 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
*	Storage/posix : Adding error checks in path formation	Nithya Balachandran	2015-02-18	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Renaming directories can cause the size of the buffer required for posix_handle_path to increase between the first call, which calculates the size, and the second call which forms the path in the buffer allocated based on the size calculated in the first call. The path created in the second call overflows the allocated buffer and overwrites the stack causing the brick process to crash. The fix adds a buffer size check to prevent the buffer overflow. It also checks and returns an error if the posix_handle_path call is unable to form the path instead of working on the incomplete path, which is likely to cause subsequent calls using the path to fail with ELOOP. Preventing buffer overflow and handling errors BUG: 1113960 Change-Id: If3d3c1952e297ad14f121f05f90a35baf42923aa Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/9289 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
*	POSIX filesystem compliance: PATH_MAX	Emmanuel Dreyfus	2014-10-03	1	-4/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	POSIX mandates the filesystem to support paths of lengths up to _XOPEN_PATH_MAX (1024). This is the PATH_MAX limit here: http://pubs.opengroup.org/onlinepubs/009604499/basedefs/limits.h.html When using a path of 1023 bytes, the posix xlator attempts to create an absolute path by prefixing the 1023 bytes path by the brick base path. The result is an absolute path of more than _XOPEN_PATH_MAX bytes which may be rejected by the backend filesystem. Linux's ext3fs PATH_MAX seems to defaut to 4096, which means it will work (except if brick base path is longer than 2072 bytes but it is unlikely to happen. NetBSD's FFS PATH_MAX defaults to 1024, which means the bug can happen regardless of brick base path length. If this condition is detected for a brick, the proposed fix is to chdir() the brick glusterfsd daemon to its brick base directory. Then when encountering a path that will exceed _XOPEN_PATH_MAX once prefixed by the brick base path, a relative path is used instead of an absolute one. We do not always use relative path because some operations require an absolute path on the brick base path itself (e.g.: statvfs). At least on NetBSD, this chdir() uncovers a race condition which causes file lookup to fail with ENODATA for a few seconds. The volume quickly reaches a sane state, but regression tests are fast enough to choke on it. The reason is obscure (as often with race conditions), but sleeping one second after the chdir() seems to change scheduling enough that the problem disapear. Note that since the chdir() is done if brick backend filesystem does not support path long enough, it will not occur with Linux ext3fs (except if brick base path is over 2072 bytes long). BUG: 1129939 Change-Id: I7db3567948bc8fa8d99ca5f5ba6647fe425186a9 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/8596 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	glusterd/quota: Heal pgfid xattr on existing data when the quota is	vmallika	2014-09-30	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enable The pgfid extended attributes are used to construct the ancestry path (from the file to the volume root) for nameless lookups on files. As NFS relies on nameless lookups heavily, quota enforcement through NFS would be inconsistent if quota were to be enabled on a volume with existing data. Solution is to heal the pgfid extended attributes as a part of lookup perfomed by quota-crawl process. In a posix lookup check for pgfid xattr and if it is missing set the xattr. Change-Id: I5912ea96787625c496bde56d43ac9162596032e9 BUG: 1147378 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/8878 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	Always check for ENODATA with ENOATTR	Emmanuel Dreyfus	2014-09-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Linux defines ENODATA and ENOATTR with the same value, which means that code can miss on on the two without breaking. FreeBSD does not have ENODATA and GlusterFS defines it as ENOATTR just like Linux does. On NetBSD, ENODATA != ENOATTR, hence we need to check for both values to get portable behavior. BUG: 764655 Change-Id: I003a3af055fdad285d235f2a0c192c9cce56fab8 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/8447 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
*	storage/posix: Prefer gfid links for inode-handle	Pranith Kumar K	2014-09-02	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: File path could change by other entry operations in-flight so if renames are in progress at the time of other operations like open, it may lead to failures. We observed that this issue can also happen while renames and readdirps/lookups are in progress because dentry-table is going stale sometimes. Fix: Prefer gfid-handles over paths for files. For directory handles prefering gfid-handles hits performance issues because it needs to resolve paths traversing up the symlinks. Tests which test if files are opened should check on gfid path after this change. So changed couple of tests to reflect the same. Note: This patch doesn't fix the issue for directories. I think a complete fix is to come up with an entry operation serialization xlator. Until then lets live with this. Change-Id: I10bda1083036d013f3a12588db7a71039d9da6c3 BUG: 1136159 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8575 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	storage/posix: do not dereference gfid symlinks before ↵	Xavier Hernandez	2014-05-02	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	posix_handle_mkdir_hashes() Whenever a new directory is created, its corresponding gfid file must also be created. This was done first calling MAKE_HANDLE_PATH() to get the path of the gfid file, then calling posix_handle_mkdir_hashes() to create the parent directories of the gfid, and finally creating the soft-link. In normal circumstances, the gfid we want to create won't exist and MAKE_HANDLE_PATH() will return a simple path to the new gfid. However if the volume is damaged and a self-heal is running, it is possible that we try to create an already existing gfid. In this case, MAKE_HANDLE_PATH() will return a path to the directory instead of the path to the gfid. To solve this problem, every time a path to a gfid is needed, a call to MAKE_HANDLE_ABSPATH() is made instead of the call to MAKE_HANDLE_PATH(). Change-Id: Ic319cc38c170434db8e86e2f89f0b8c28c0d611a BUG: 859581 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/5075 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	storage/posix: HANDLE_PFX is redundant use GF_HIDDEN_PATH instead	Harshavardhana	2014-01-22	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	GF_HIDDEN_PATH usage would help in better readability of the code and avoids bugs produced from redundant macro constants. Change-Id: I2fd7e92e87783ba462ae438ced2cf4f720a25f5c BUG: 990028 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/6756 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	posix: placeholders for GFID to path conversion	Raghavendra G	2013-11-26	1	-5/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	what? ===== The following is an attempt to generate the paths of a file when only its gfid is known. To find the path of a directory, the symlink handle to the directory maintained in the ".glusterfs" backend directory is read. The symlink handle is generated using the gfid of the directory. It (handle) contains the directory's name and parent gfid, which are used to recursively construct the absolute path as seen by the user from the mount point. A similar approach cannot be used for a regular file or a symbolic link since its hardlink handle, generated using its gfid, doesn't contain its parent gfid and basename. So xattrs are set to store the parent gfids and the number of hardlinks to a file or a symlink having the same parent gfid. When an user/application requests for the paths of a regular file or a symlink with multiple hardlinks, using the parent gfids stored in the xattrs, the paths of the parent directories are generated as mentioned earlier. The base names of the hardlinks (with the same parent gfid) are determined by matching the actual backend inode numbers of each entry in the parent directory with that of the hardlink handle. Xattr is set on a regular file, link, and symbolic link as follows, Xattr name : trusted.pgfid.<pargfidstr> Xattr value : <number of hardlinks to a regular file/symlink with the same parentgfid> If a regular file, hard link, symbolic link is created then an xattr in the above format is set in the backend. how to use? =========== This functionality can be used through getxattr interface. Two keys - glusterfs.ancestry.dentry and glusterfs.ancestry.path - enable usage of this functionality. A successful getxattr will have the result stored under same keys. Values will be, glusterfs.ancestry.dentry: -------------------------- A linked list of gf-dirent structures for all possible paths from root to this gfid. If there are multiple paths, the linked-list will be a series of paths one after another. Each path will be a series of dentries representing all components of the path. This key is primarily for internal usage within glusterfs. glusterfs.ancestry.path: ------------------------ A string containing all possible paths from root to this gfid. Multiple hardlinks of a file or a symlink are displayed as a colon seperated list (this could interfere with path components containing ':'). e.g. If there is a file "file1" in root directory with two hardlinks, "/dir2/link2tofile1" and "/dir1/link1tofile1", then [root@alpha gfsmntpt]# getfattr -n glusterfs.ancestry.path -e text file1 glusterfs.ancestry.path="/file1:/dir2/link2tofile1:/dir1/link1tofile1" Thanks Amar, Avati and Venky for the inputs. Original Author: Ramana Raja <rraja@redhat.com> BUG: 990028 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I0eaa9101e333e0c1f66ccefd9e95944dd4a27497 Reviewed-on: http://review.gluster.org/5951 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	All: License message change	Varun Shastry	2012-09-13	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \|	License message changed for server-side, dual license GPLV2 and LGPLv3+. Change-Id: Ia9e53061b9d2df3b3ef3bc9778dceff77db46a09 BUG: 852318 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3940 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	All: License message change	Varun Shastry	2012-08-28	1	-16/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The license message is changed to Copyright (c) 2008-2012 Red Hat, Inc. <http://www.redhat.com> This file is part of GlusterFS. This file is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. Change-Id: I07d2b63ed5fbbbd1884f1e74f2dd56013d15b0f4 BUG: 852318 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3858 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	storage/posix: Move landfill inside .glusterfs	Pranith Kumar K	2012-05-31	1	-0/+2
\| \| \| \| \| \| \| \| \|	Change-Id: Ia2944f891dd62e72f3c79678c3a1fed389854a90 BUG: 811970 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3158 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	storage/posix: prefer absolute path handles over GFID handlesv3.3.0qa37	Anand Avati	2012-04-21	1	-12/+12
\| \| \| \| \| \| \| \| \| \|	Change-Id: I9afefa2f8a39c5f2c77271ea64aff95249c88821 BUG: 791187 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.com/3204 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
*	posix: handle some internal behavior in posix_mknod()	Amar Tumballi	2012-02-16	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	assume a case of link() systemcall, which is handled in distribute by creating a 'linkfile' in hashed subvolume, if the 'oldloc' is present in different subvolume. we have same 'gfid' for the linkfile as that of file for consistency. Now, a file with multiple hardlinks, we may end up with 'hardlinked' linkfiles. dht create linkfile using 'mknod()' fop, and as now posix_mknod() is not equipped to handle this situation. this patch fixes the situation by looking at the 'internal' key set in the dictionary to differentiate the call which originates from inside with regular system calls. Change-Id: Ibff7c31f8e0c8bdae035c705c93a295f080ff985 BUG: 763844 Signed-off-by: Amar Tumballi <amar@gluster.com> Reviewed-on: http://review.gluster.com/2755 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	core: GFID filehandle based backend and anonymous FDs	Anand Avati	2012-01-20	1	-0/+148
	1. What -------- This change introduces an infrastructure change in the filesystem which lets filesystem operation address objects (inodes) just by its GFID. Thus far GFID has been a unique identifier of a user-visible inode. But in terms of addressability the only mechanism thus far has been the backend filesystem path, which could be derived from the GFID only if it was cached in the inode table along with the entire set of dentry ancestry leading up to the root. This change essentially decouples addressability from the namespace. It is no more necessary to be aware of the parent directory to address a file or directory. 2. Why ------- The biggest use case for such a feature is NFS for generating persistent filehandles. So far the technique for generating filehandles in NFS has been to encode path components so that the appropriate inode_t can be repopulated into the inode table by means of a recursive lookup of each component top-down. Another use case is the ability to perform more intelligent self-healing and rebalancing of inodes with hardlinks and also to detect renames. A derived feature from GFID filehandles is anonymous FDs. An anonymous FD is an internal USABLE "fd_t" which does not map to a user opened file descriptor or to an internal ->open()'d fd. The ability to address a file by the GFID eliminates the need to have a persistent ->open()'d fd for the purpose of avoiding the namespace. This improves NFS read/write performance significantly eliminating open/close calls and also fixes some of today's limitations (like keeping an FD open longer than necessary resulting in disk space leakage) 3. How ------- At each storage/posix translator level, every file is hardlinked inside a hidden .glusterfs directory (under the top level export) with the name as the ascii-encoded standard UUID format string. For reasons of performance and scalability there is a two-tier classification of those hardlinks under directories with the initial parts of the UUID string as the directory names. For directories (which cannot be hardlinked), the approach is to use a symlink which dereferences the parent GFID path along with basename of the directory. The parent GFID dereference will in turn be a dereference of the grandparent with the parent's basename, and so on recursively up to the root export. 4. Development --------------- 4a. To leverage the ability to address an inode by its GFID, the technique is to perform a "nameless lookup". This means, to populate a loc_t structure as: loc_t { pargfid: NULL parent: NULL name: NULL path: NULL gfid: GFID to be looked up [out parameter] inode: inode_new () result [in parameter] } and performing such lookup will return in its callback an inode_t populated with the right contexts and a struct iatt which can be used to perform an inode_link () on the inode (without a parent and basename). The inode will now be hashed and linked in the inode table and findable via inode_find(). A fundamental change moving forward is that the primary fields in a loc_t structure are now going to be (pargfid, name) and (gfid) depending on the kind of FOP. So far path had been the primary field for operations. The remaining fields only serve as hints/helpers. 4b. If read/write is to be performed on an inode_t, the approach so far has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost in performance in the inbuilt NFS server. 5. Misc ------- The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent with the rest of the codebase. Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f BUG: 781318 Reviewed-on: http://review.gluster.com/669 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>