summaryrefslogtreecommitdiffstats
path: root/xlators/storage/posix/src/posix-inode-fd-ops.c
Commit message (Collapse)AuthorAgeFilesLines
* posix: stack-buffer-overflow reported by asanHarpreet Kaur2018-12-261-1/+3
| | | | | | | | | | | | This patch fixes buffer overflow in $SRC/xlators/storage/posix/src/posix-inode-fd-ops.c Memory access at offset 432 overflows "md5_checksum" variable. SUMMARY: AddressSanitizer: stack-buffer-overflow (/lib64/libasan.so.5+0xb825a) updates: bz#1633930 Change-Id: I46010a09161d02cdf0c69679a334ec1d3d49cffb Signed-off-by: Harpreet Kaur <hlalwani@redhat.com>
* posix: use synctask for janitorPoornima G2018-12-191-16/+21
| | | | | | | | | | | | | | With brick mux, the number of threads increases as the number of bricks increases. As an initiative to reduce the number of threads in brick mux scenario, replacing janitor thread to use synctask infra. Now close() and closedir() handle by separate janitor thread which is linked with glusterfs_ctx. Updates #475 Change-Id: I0c4aaf728125ab7264442fde59f3d08542785f73 Signed-off-by: Poornima G <pgurusid@redhat.com>
* copy_file_range support in GlusterFSRaghavendra Bhat2018-12-121-0/+268
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * libglusterfs changes to add new fop * Fuse changes: - Changes in fuse bridge xlator to receive and send responses * posix changes to perform the op on the backend filesystem * protocol and rpc changes for sending and receiving the fop * gfapi changes for performing the fop * tools: glfs-copy-file-range tool for testing copy_file_range fop - Although, copy_file_range support has been added to the upstream fuse kernel module, no release has been made yet of a kernel which contains the support. It is expected to come in the upcoming release of linux-4.20 So, as of now, executing copy_file_range fop on a fused based filesystem results in fuse kernel module sending read on the source fd and write on the destination fd. Therefore a small gfapi based tool has been written to be able test the copy_file_range fop. This tool is similar (in functionality) to the example program given in copy_file_range man page. So, running regular copy_file_range on a fuse mount point and running gfapi based glfs-copy-file-range tool gives some idea about how fast, the copy_file_range (or reflink) can be. On the local machine this was the result obtained. mount -t glusterfs workstation:new /mnt/glusterfs [root@workstation ~]# cd /mnt/glusterfs/ [root@workstation glusterfs]# ls file [root@workstation glusterfs]# cd [root@workstation ~]# time /tmp/a.out /mnt/glusterfs/file /mnt/glusterfs/new real 0m6.495s user 0m0.000s sys 0m1.439s [root@workstation ~]# time glfs-copy-file-range $(hostname) new /tmp/glfs.log /file /rrr OPEN_SRC: opening /file is success OPEN_DST: opening /rrr is success FSTAT_SRC: fstat on /rrr is success copy_file_range successful real 0m0.309s user 0m0.039s sys 0m0.017s This tool needs following arguments 1) hostname 2) volume name 3) log file path 4) source file path (relative to the gluster volume root) 5) destination file path (relative to the gluster volume root) "glfs-copy-file-range <hostname> <volume> <log file path> <source> <destination>" - Added a testcase as well to run glfs-copy-file-range tool * io-stats changes to capture the fop for profiling * NOTE: - Added conditional check to see whether the copy_file_range syscall is available or not. If not, then return ENOSYS. - Added conditional check for kernel minor version in fuse_kernel.h and fuse-bridge while referring to copy_file_range. And the kernel minor version is kept as it is. i.e. 24. Increment it in future when there is a kernel release which contains the support for copy_file_range fop in fuse kernel module. * The document which contains a writeup on this enhancement can be found at https://docs.google.com/document/d/1BSILbXr_knynNwxSyyu503JoTz5QFM_4suNIh2WwrSc/edit Change-Id: I280069c814dd21ce6ec3be00a884fc24ab692367 updates: #536 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* libglusterfs: Move devel headers under glusterfs directoryShyamsundarR2018-12-051-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | libglusterfs devel package headers are referenced in code using include semantics for a program, this while it works can be better especially when dealing with out of tree xlator builds or in general out of tree devel package usage. Towards this, the following changes are done, - moved all devel headers under a glusterfs directory - Included these headers using system header notation <> in all code outside of libglusterfs - Included these headers using own program notation "" within libglusterfs This change although big, is just moving around the headers and making it correct when including these headers from other sources. This helps us correctly include libglusterfs includes without namespace conflicts. Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b Updates: bz#1193929 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* posix: fix memory leakAmar Tumballi2018-11-281-7/+16
| | | | | | | | | | | | | | | | | | | | | | | Direct leak of 609960 byte(s) in 4485 object(s) allocated from: #0 0x7f0d719bea50 in __interceptor_calloc (/lib64/libasan.so.5+0xefa50) #1 0x7f0d716dc08f in __gf_calloc ../../../libglusterfs/src/mem-pool.c:111 #2 0x7f0d5d41d9b2 in __posix_get_mdata_xattr ../../../../../xlators/storage/posix/src/posix-metadata.c:240 #3 0x7f0d5d41dd6b in posix_get_mdata_xattr ../../../../../xlators/storage/posix/src/posix-metadata.c:317 #4 0x7f0d5d39e855 in posix_fdstat ../../../../../xlators/storage/posix/src/posix-helpers.c:685 #5 0x7f0d5d3d65ec in posix_create ../../../../../xlators/storage/posix/src/posix-entry-ops.c:2173 Direct leak of 609960 byte(s) in 4485 object(s) allocated from: #0 0x7f0d719bea50 in __interceptor_calloc (/lib64/libasan.so.5+0xefa50) #1 0x7f0d716dc08f in __gf_calloc ../../../libglusterfs/src/mem-pool.c:111 #2 0x7f0d5d41ced2 in posix_set_mdata_xattr ../../../../../xlators/storage/posix/src/posix-metadata.c:359 #3 0x7f0d5d41e70f in posix_set_ctime ../../../../../xlators/storage/posix/src/posix-metadata.c:616 #4 0x7f0d5d3d662c in posix_create ../../../../../xlators/storage/posix/src/posix-entry-ops.c:2181 We were freeing only the first context in inode during forget, and not the second. updates: bz#1633930 Change-Id: Ib61b4453aa3d2039d6ce660f52ef45390539b9db Signed-off-by: Amar Tumballi <amarts@redhat.com>
* posix: Fix null pointer dererfenceSusant Palai2018-11-141-0/+7
| | | | | | | | CID: 1124799 1214618 Change-Id: Iff05180983fe9600be0a2ce015a137e4efb8f533 updates: bz#789278 Signed-off-by: Susant Palai <spalai@redhat.com>
* posix: Fix coverity issueVarsha Rao2018-11-021-2/+6
| | | | | | | | | This patch fixes the unchecked return value, coverity issue. CID: 1391412 Change-Id: If85f4afdf8c6d37602c62fbf4d7c730e18be81e7 updates: bz#789278 Signed-off-by: Varsha Rao <varao@redhat.com>
* posix : fix coverity issues in posix-inode-fd-ops.cSunny Kumar2018-10-251-2/+2
| | | | | | | | This patch fixes CID: 1356526 and 1382369 : Argument cannot be negative Change-Id: I1aab5be2d217479db9f67a26b62854a0b38c1747 updates: bz#789278 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* posix: fill glusterfs.posix.* acl xattrs at dictKinglong Mee2018-10-221-1/+1
| | | | | | Change-Id: I0730a037f96c4386c72ecf2f61c71ec17ffbc1b0 Updates: bz#1634220 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* posix: return more xattrs at getxattr/fgetxattr when client requestKinglong Mee2018-10-171-2/+26
| | | | | | Change-Id: I37ac6186b3631979d2503d1b185a61b8094dbd0d Updates: bz#1634220 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* all: fix warnings on non 64-bits architecturesXavi Hernandez2018-10-101-3/+3
| | | | | | | | | | When compiling in other architectures there appear many warnings. Some of them are actual problems that prevent gluster to work correctly on those architectures. Change-Id: Icdc7107a2bc2da662903c51910beddb84bdf03c0 fixes: bz#1632717 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* storage/posix: Check if fd->inode is NULL before using itAshish Pandey2018-09-201-4/+4
| | | | | | | | | | | CID: 1395473, 1395472 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=85588219&defectInstanceId=26115956&mergedDefectId=1395472 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=85588219&defectInstanceId=26115961&mergedDefectId=1395473 Change-Id: I2c3cc350e0ac156616df6f568ba28dbfa68064bf updates: bz#789278 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* Land part 2 of clang-format changesGluster Ant2018-09-121-4287/+4288
| | | | | Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* posix: fix the duplicate macro definitionsAmar Tumballi2018-09-101-4/+0
| | | | | | | | | | the macro 'PATH_SET_TIMESPEC_OR_TIMEVAL' is defined in posix.h, which seems to be good place for it updates: bz#1193929 Change-Id: I521a2a6efeb26891c24637a5f06b1d90c00b1135 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* posix: disable open/read/write on special filesAmar Tumballi2018-09-061-0/+33
| | | | | | | | | | | | | | | | | | | In the file system, the responsibility w.r.to the block and char device files is related to only support for 'creating' them (using mknod(2)). Once the device files are created, the read/write syscalls for the specific devices are handled by the device driver registered for the specific major number, and depending on the minor number, it knows where to read from. Hence, we are at risk of reading contents from devices which are handled by the host kernel on server nodes. By disabling open/read/write on the device file, we would be safe with the bypass one can achieve from client side (using gfapi) Fixes: bz#1625096 Change-Id: I48c776b0af1cbd2a5240862826d3d8918601e47f Signed-off-by: Amar Tumballi <amarts@redhat.com>
* posix: remove not supported get/set contentAmar Tumballi2018-09-041-14/+0
| | | | | | | | | | | | | | getting and setting a file's content using extended attribute worked great as a GET/PUT alternative when an object storage is supported on top of Gluster. But it needs application changes, and also, it skips some caching layers. It is not used over years, and not supported any more. Remove the dead code. Fixes: bz#1625102 Change-Id: Ide3b3f1f644f6ca58558bbe45561f346f96b95b7 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* Multiple files: calloc -> mallocYaniv Kaul2018-09-041-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xlators/storage/posix/src/posix-inode-fd-ops.c: xlators/storage/posix/src/posix-helpers.c: xlators/storage/bd/src/bd.c: xlators/protocol/client/src/client-lk.c: xlators/performance/quick-read/src/quick-read.c: xlators/performance/io-cache/src/page.c xlators/nfs/server/src/nfs3-helpers.c xlators/nfs/server/src/nfs-fops.c xlators/nfs/server/src/mount3udp_svc.c xlators/nfs/server/src/mount3.c xlators/mount/fuse/src/fuse-helpers.c xlators/mount/fuse/src/fuse-bridge.c xlators/mgmt/glusterd/src/glusterd-utils.c xlators/mgmt/glusterd/src/glusterd-syncop.h xlators/mgmt/glusterd/src/glusterd-snapshot.c xlators/mgmt/glusterd/src/glusterd-rpc-ops.c xlators/mgmt/glusterd/src/glusterd-replace-brick.c xlators/mgmt/glusterd/src/glusterd-op-sm.c xlators/mgmt/glusterd/src/glusterd-mgmt.c xlators/meta/src/subvolumes-dir.c xlators/meta/src/graph-dir.c xlators/features/trash/src/trash.c xlators/features/shard/src/shard.h xlators/features/shard/src/shard.c xlators/features/marker/src/marker-quota.c xlators/features/locks/src/common.c xlators/features/leases/src/leases-internal.c xlators/features/gfid-access/src/gfid-access.c xlators/features/cloudsync/src/cloudsync-plugins/src/cloudsyncs3/src/libcloudsyncs3.c xlators/features/bit-rot/src/bitd/bit-rot.c xlators/features/bit-rot/src/bitd/bit-rot-scrub.c bxlators/encryption/crypt/src/metadata.c xlators/encryption/crypt/src/crypt.c xlators/performance/md-cache/src/md-cache.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible It doesn't make sense to calloc (allocate and clear) memory when the code right away fills that memory with data. It may be optimized by the compiler, or have a microscopic performance improvement. In some cases, also changed allocation size to be sizeof some struct or type instead of a pointer - easier to read. In some cases, removed redundant strlen() calls by saving the result into a variable. 1. Only done for the straightforward cases. There's room for improvement. 2. Please review carefully, especially for string allocation, with the terminating NULL string. Only compile-tested! .. and allocate memory as much as needed. xlators/nfs/server/src/mount3.c : Don't blindly allocate PATH_MAX, but strlen() the string and allocate appropriately. Also, align error messges. updates: bz#1193929 Original-Author: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Ibda6f33dd180b7f7694f20a12af1e9576fe197f5
* multiple xlators (storage/posix): strncpy()->sprintf(), reduce strlen()'sYaniv Kaul2018-08-311-27/+29
| | | | | | | | | | | | | | | | | | | | | | | | xlators/storage/posix/src/posix-gfid-path.c xlators/storage/posix/src/posix-handle.c xlators/storage/posix/src/posix-helpers.c xlators/storage/posix/src/posix-inode-fd-ops.c xlators/storage/posix/src/posix.h strncpy may not be very efficient for short strings copied into a large buffer: If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written. Instead, use snprintf(). Also: - save the result of strlen() and re-use it when possible. - move from strlen to SLEN (sizeof() ) for const strings. Compile-tested only! Change-Id: I056939f111a4ec6bc8ebd539ebcaf9eb67da6c95 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* clang-scan: fix multiple issuesAmar Tumballi2018-08-311-5/+12
| | | | | | | | | | | * Buffer overflow issue in glusterfsd * Null argument passed to function expecting non-null (event-epoll) * Make sure the op_ret value is set in macro (posix) Updates: bz#1622665 Change-Id: I32b378fc40a5e3ee800c0dfbc13335d44c9db9ac Signed-off-by: Amar Tumballi <amarts@redhat.com>
* multiple files: remove unndeeded memset()Yaniv Kaul2018-08-291-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a squash of multiple commits: contrib/fuse-lib/misc.c: remove unneeded memset() All flock variables are properly set, no need to memset it. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I8e0512c5a88daadb0e587f545fdb9b32ca8858a2 libglusterfs/src/{client_t|fd|inode|stack}.c: remove some memset() I don't think there's a need for any of them. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I2be9ccc3a5cb5da51a92af73488cdabd1c527f59 libglusterfs/src/xlator.c: remove unneeded memset() All xl->mem_acct members are properly set, no need to memset it. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I7f264cd47e7a06255a3f3943c583de77ae8e3147 xlators/cluster/afr/src/afr-self-heal-common.c: remove unneeded memset() Since we are going over the whole array anyway, initialize it properly, to either 1 or 0. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Ied4210388976b6a7a2e91cc3de334534d6fef201 xlators/cluster/dht/src/dht-common.c: remove unneeded memset() Since we are going over the whole array anyway it is initialized properly. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Idc436d2bd0563b6582908d7cbebf9dbc66a42c9a xlators/cluster/ec/src/ec-helpers.c: remove unneeded memset() Since we are going over the whole array anyway it is initialized properly. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I81bf971f7fcecb4599e807d37f426f55711978fa xlators/mgmt/glusterd/src/glusterd-volgen.c: remove some memset() I don't think there's a need for any of them. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I476ea59ba53546b5153c269692cd5383da81ce2d xlators/mgmt/glusterd/src/glusterd-geo-rep.c: read() in 4K blocks The current 1K seems small. 4K is usually better (in Linux). Also remove a memset() that I don't think is needed between reads. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I5fb7950c92d282948376db14919ad12e589eac2b xlators/storage/posix/src/posix-{gfid-path|inode-fd-ops}.c: remove memset() before sys_*xattr() functions. I don't see a reason to memset the array sent to the functions sys_llistxattr(), sys_lgetxattr(), sys_lgetxattr(), sys_flistxattr(), sys_fgetxattr(). (Note: it's unclear to me why we are calling sys_*txattr() functions with XATTR_VAL_BUF_SIZE-1 size instead of XATTR_VAL_BUF_SIZE ). Only compile-tested! Change-Id: Ief2103b56ba6c71e40ed343a93684eef6b771346 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* posix: prevent crash when SEEK_DATA/HOLE is not supportedXavi Hernandez2018-08-031-2/+4
| | | | | | | | | Instead of not defining the 'seek' fop when it's not supported on the compilation platform, we simply return EINVAL when it's used. Fixes: bz#1611834 Change-Id: I253666d8910c5e2fffa3a3ba37085e5c1c058a8e Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* afr: switch lk_owner only when pre-op succeedsRavishankar N2018-07-181-5/+0
| | | | | | | | | | | | | | | | | | | Problem: In a disk full scenario, we take a failure path in afr_transaction_perform_fop() and go to unlock phase. But we change the lk-owner before that, causing unlock to fail. When mount issues another fop that takes locks on that file, it hangs. Fix: Change lk-owner only when we are about to perform the fop phase. Also fix the same issue for arbiters when afr_txn_arbitrate_fop() fails the fop. Also removed the DISK_SPACE_CHECK_AND_GOTO in posix_xattrop. Otherwise truncate to zero will fail pre-op phase with ENOSPC when the user is actually trying to freee up space. Change-Id: Ic4c8a596b4cdf4a7fc189bf00b561113cf114353 fixes: bz#1602236 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* md-cache: Do not invalidate cache post set/remove xattrPoornima G2018-07-111-42/+43
| | | | | | | | | | | | | | | Since setxattr and removexattr fops cbk do not carry poststat, the stat cache was being invalidated in setxatr/remoxattr cbk. Hence the further lookup wouldn't be served from cache. To prevent this invalidation, md-cache is modified to get the poststat in set/removexattr_cbk in dict. Co-authored with Xavi Hernandez. Change-Id: I6b946be2d20b807e2578825743c25ba5927a60b4 fixes: bz#1586018 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Signed-off-by: Poornima G <pgurusid@redhat.com>
* posix: Do not log ENXIO errors for seek fopXavi Hernandez2018-07-101-2/+3
| | | | | | | | | | | | | When lseek is used with SEEK_DATA and SEEK_HOLE, it's expected that the last operation fails with ENXIO when offset is beyond the end of file. In this case it doesn't make sense to report this as an error log message. This patch reports ENXIO failure messages for seek fops in debug level instead of error level. Change-Id: I62a4f61f99b0e4d7ea6a2cdcd40afe15072794ac fixes: bz#1598926 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* storage/posix: Add warning logs on failureN Balachandran2018-07-021-2/+12
| | | | | | | | | | | posix_readdirp_fill will fail to update the iatt information if posix_handle_path fails. There is currently no log message to indicate this making debugging difficult. Change-Id: I6bce360ea7d1696501637433f80e02794fe1368f updates: bz#1564071 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* ctime: Fix self heal of symlink in EC volumeKotresh HR2018-06-201-2/+0
| | | | | | | | | | | | | | | | | | | | Since IEEE Std 1003.1-2001 does not require any association of file times with symbolic links, there is no requirement that file times be updated by readlink() states [1]. stat on symlink file was generating a readlink fop on one of the subvolumes of ec set which in turn updates atime on that subvolume. This causes mdata xattr to be different across ec set and hence self heal fails. So based on [1], atime is no longer updated by readlink fop. [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/readlink.html fixes: bz#1592509 Change-Id: I08bd3ca3bdb222bd18160b1aa58fc2f7630c8083 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* storage/posix: Handle ENOSPC correctly in zero_fillPranith Kumar K2018-06-141-1/+22
| | | | | | Change-Id: Icc521d86cc510f88b67d334b346095713899087a fixes: bz#1590710 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* dht: Delete MDS internal xattr from dict in dht_getxattr_cbkMohit Agrawal2018-06-031-31/+0
| | | | | | | | | | | | | | Problem: At the time of fetching xattr to heal xattr by afr it is not able to fetch xattr because posix_getxattr has a check to ignore if xattr name is MDS Solution: To ignore same xattr update a check in dht_getxattr_cbk instead of having a check in posix_getxattr BUG: 1584098 Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc fixes: bz#1584098 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* storage/posix: use proper FOP for unwinding readdir(p)Raghavendra Bhat2018-05-241-3/+8
| | | | | | | | | As of now, even for readdirp, posix is unwinding with readdir signature. Change-Id: I6440c8a253c5d78bbcc97043e4e6e208e3d47cd1 fixes: bz#1581345 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* posix: use the ctime framework to handle setattr ctime payloadCsaba Henk2018-05-181-5/+16
| | | | | | | | | | Work on #208 having been been merged, we have obtained means to associate arbitrary ctimes with files, so we can handle setattr ctime payload with proper semantics. Updates: #435 Change-Id: I7302a3ee2574ca9bba605c7a8586c16c452f82c1 Signed-off-by: Csaba Henk <csaba@redhat.com>
* posix/ctime: posix hook to set ctime xattr in relevant fopsKotresh HR2018-05-061-3/+52
| | | | | | | | | | | This patch uses the ctime posix APIs to set consistent time across replica on disk. It also stores the time attributes in the inode context. Credits: Rafi KC <rkavunga@redhat.com> Updates: #208 Change-Id: I1a8d74d1e251f1d6d142f066fc99258025c0bcdd Signed-off-by: Kotresh HR <khiremat@redhat.com>
* posix/ctime: posix hooks to get consistent time xattrKotresh HR2018-05-061-30/+35
| | | | | | | | | | | | This patch uses the ctime posix APIs to get consistent time across replica. The time attributes are got from from inode context or from on disk if not found and merged with iatt to be returned. Credits: Rafi KC <rkavunga@redhat.com> Updates: #208 Change-Id: Id737038ce52468f1f5ebc8a42cbf9c6ffbd63850 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* fuse: add support for kernel writeback cacheCsaba Henk2018-05-041-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Added kernel-writeback-cache command line and xlator option for requesting utilisation of the writeback cache of the kernel in FUSE_INIT (see [1]). - Added attr-times-granularity command line and xlator option via which granularity of the {a,m,c}time in stat (attr) data that we support can be indicated to kernel. This is a means to avoid divergence of the attr times between kernel and userspace that could occur with writeback-cache, while still maintaining maximum time precision the FUSE server is capable of (see [2]). - Handling FATTR_CTIME flag in FUSE_SETATTR that indicates presence of ctime in setattr payload. Currently we cannot associate arbitrary ctimes to files on backend, so we just touch them to update their ctimes to current time. Having ctimes in setattr payload is also a side effect of writeback cache (see [3] and [4]). [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d99ff8, "fuse: Turn writeback cache on" [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e27c9d3, "fuse: fuse: add time_gran to INIT_OUT" [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e18bda, "fuse: add .write_inode" [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab9e13f, "fuse: allow ctime flushing to userspace" Updates: #435 Change-Id: Id174c8e0c815c4456c35f8c53e41a6a507d91855 Signed-off-by: Csaba Henk <csaba@redhat.com>
* posix: Avoid changelog retries for geo-repMohit Agrawal2018-05-031-0/+33
| | | | | | | | | | | | | | | Problem: georep is slowdown to migrate directory from master volume to slave volume due to lot of changelog retries Solution: Update the condition in posix_getxattr to ignore MDS_INTERNAL_XATTR as it(posix) ignored other internal xattrs BUG: 1571069 Change-Id: I4d91ec73e5b1ca1cb3ecf0825ab9f49e261da70e fixes: bz#1571069 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* Revert "storage/posix: add pgfid in readdirp if needed"Nigel Babu2018-04-181-38/+8
| | | | | | | | This reverts commit d206fab73f6815c927a84171ee9361c9b31557b1. Change-Id: I5b43fdcf916bc844437c9d60f6957bc40936e3c2 Updates: bz#1560319 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* posix: reserve option behavior is not correct while using fallocateMohit Agrawal2018-04-111-0/+9
| | | | | | | | | | | | | | | | | Problem: storage.reserve option is not working correctly while disk space is allocate throguh fallocate Solution: In posix_disk_space_check_thread_proc after every 5 sec interval it calls posix_disk_space_check to monitor disk space and set the flag in posix priv.In 5 sec timestamp user can create big file with fallocate that can reach posix reserve limit and no error is shown on terminal even limit has reached. To resolve the same call posix_disk_space for every fallocate fop instead to call by a thread after 5 second BUG: 1560411 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I39ba9390e2e6d084eedbf3bcf45cd6d708591577
* storage/posix: add pgfid in readdirp if neededKinglong Mee2018-04-101-8/+38
| | | | | | Change-Id: I6745428fd9d4e402bf2cad52cee8ab46b7fd822f fixes: bz#1560319 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* posix: check file state before continuing with fopsSusant Palai2018-04-101-14/+252
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In context of Cloudsync: In scenarios where a data modification fop e.g. a write landed in POSIX thinking that the file is local, while the file was actually remote, can be dangerous. Ofcourse we don’t want to take inodelk for every read/write operation to check the archival status or coordinate with an upload or a download of a file. To avoid inodelk, we will check the status of the file in POSIX it self, before we resume the fop. This helps us avoiding any races mentioned above. Now e.g. if a write reached POSIX for a file which was actually remote, it can check the status of the file and will get to know that the file is remote. It can error out with this status “remote” and cloudsync xlator will retry the same operation, once it finished downloading the file. This patch includes the setxattr changes to do the post processing of upload i.e. truncate and setting the remote xattr "trusted.glusterfs.cs.remote" to indicate the file is REMOTE Each file will have no xattr if the file is LOCAL, one remote xattr if the file is REMOTE and a combination of REMOTE and DOWNLOADING xattr if the file is getting downloaded. There is healing logic of these xattrs to recover from crash inconsitencies. Fixes: #387 Change-Id: Ie93c2d41aa8d6a798a39bdbef9d1669f057e5fdb Signed-off-by: Susant Palai <spalai@redhat.com>
* storage/posix: Add active-fd-count option in glusterPranith Kumar K2018-03-211-0/+12
| | | | | | | | | | | | | | | | | | | | Problem: when dd happens on sharded replicate volume all the writes on shards happen through anon-fd. When the writes don't come quick enough, old anon-fd closes and new fd gets created to serve the new writes. open-fd-count is decremented only after the fd is closed as part of fd_destroy(). So even when one fd is on the way to be closed a new fd will be created and during this short period it appears as though there are multiple fds opened on the file. AFR thinks another application opened the same file and switches off eager-lock leading to extra latency. Fix: Have a different option called active-fd whose life cycle starts at fd_bind() and ends just before fd_destroy() BUG: 1557932 Change-Id: I2e221f6030feeedf29fbb3bd6554673b8a5b9c94 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* glusterfsd: Memleak in glusterfsd process while brick mux is onMohit Agrawal2018-02-271-3/+12
| | | | | | | | | | | | | | | | | | Problem: At the time of stopping the volume while brick multiplex is enabled memory is not cleanup from all server side xlators. Solution: To cleanup memory for all server side xlators call fini in glusterfs_handle_terminate after send GF_EVENT_CLEANUP notification to top xlator. BUG: 1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Note: Run all test-cases in separate build (https://review.gluster.org/19574) with same patch after enable brick mux forcefully, all test cases are passed. Change-Id: Ia10dc7f2605aa50f2b90b3fe4eb380ba9299e2fc
* posix/afr: handle backward compatibility for rchecksum fopRavishankar N2018-02-191-3/+23
| | | | | | | | | Added a volume option 'fips-mode-rchecksum' tied to op version 4. If not set, rchecksum fop will use MD5 instead of SHA256. updates: #230 Change-Id: Id8ea1303777e6450852c0bc25503cda341a6aec2 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/dht: avoid overwriting client writes during migrationSusant Palai2018-02-021-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | For more details on this issue see https://github.com/gluster/glusterfs/issues/308 Solution: This is a restrictive solution where a file will not be migrated if a client writes to it during the migration. This does not check if the writes from the rebalance and the client actually do overlap. If dht_writev_cbk finds that the file is being migrated (PHASE1) it will set an xattr on the destination file indicating the file was updated by a non-rebalance client. Rebalance checks if any other client has written to the dst file and aborts the file migration if it finds the xattr. updates gluster/glusterfs#308 Change-Id: I73aec28bc9dbb8da57c7425ec88c6b6af0fbc9dd Signed-off-by: Susant Palai <spalai@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com>
* storage/posix: Set f_bfree to 0 if brick fullN Balachandran2018-01-151-1/+12
| | | | | | | | | Return 0 free blocks if the brick is full or has less than the reserved limit. Change-Id: I2c5feda0303d0f4abe5af22fac903011792b2dc8 BUG: 1533736 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Add migration checks to dht_(f)xattropN Balachandran2017-12-261-0/+1
| | | | | | | | | | | | The dht_(f)xattrop implementation did not implement migration phase1/phase2 checks which could cause issues with rebalance on sharded volumes. This does not solve the issue where fops may reach the target out of order. Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c BUG: 1471031 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* rchecksum/fips: Replace MD5 usage to enable fips supportKotresh HR2017-12-211-2/+1
| | | | | | | | | rchecksum uses MD5 which is not fips compliant. Hence using sha256 for the same. Updates: #230 Change-Id: I7fad016fcc2a9900395d0da919cf5ba996ec5278 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* posix: Reorganize posix xlator to prepare for reuse with rioShyamsundarR2017-12-021-0/+4975
1. Split out entry and inode/fd based FOPs into separate files from posix.c 2. Split out common routines (init, fini, reconf, and such) into its own file, from posix.c 3. Retain just the method assignments in posix.c (such that posix2 for RIO can assign its own methods in the future for entry operations and such) 4. Based on the split in (1) and (2) split out posix-handle.h into 2 files, such that macros that are needed for inode ops are in one and rest are in the other If the split is done as above, posix2 can compile with its own entry ops, and hence not compile, the entry ops as split in (1) above. The split described in (4) can again help posix2 to define its own macros to make entry and inode handles, thus not impact existing POSIX xlator code. Noted problems - There are path references in certain cases where quota is used (in the xattr FOPs), and thus will fail on reuse in posix2, this needs to be handled when we get there. - posix_init does set root GFID on the brick root, and this is incorrect for posix2, again will need handling later when posix2 evolves based on this code (other init checks seem fine on current inspection) Merge of experimental branch patches with the following gerrit change-IDs > Change-Id: I965ce6dffe70a62c697f790f3438559520e0af20 > Change-Id: I089a4d9cf470c2f9c121611e8ef18dea92b2be70 > Change-Id: I2cec103f6ba8f3084443f3066bcc70b2f5ecb49a Fixes gluster/glusterfs#327 Change-Id: I0ccfa78559a7c5a68f5e861e144cf856f5c9e19c Signed-off-by: ShyamsundarR <srangana@redhat.com>