diff options
Diffstat (limited to 'doc')
47 files changed, 3149 insertions, 2691 deletions
diff --git a/doc/Makefile.am b/doc/Makefile.am index 1103b607dba..de68c20b4d7 100644 --- a/doc/Makefile.am +++ b/doc/Makefile.am @@ -1,6 +1,9 @@ EXTRA_DIST = glusterfs.8 mount.glusterfs.8 gluster.8 \ glusterd.8 glusterfsd.8 -man8_MANS = glusterfs.8 mount.glusterfs.8 gluster.8 glusterd.8 glusterfsd.8 +man8_MANS = glusterfs.8 mount.glusterfs.8 gluster.8 +if WITH_SERVER +man8_MANS += glusterd.8 glusterfsd.8 +endif CLEANFILES = diff --git a/doc/README.md b/doc/README.md index e057437fcba..6aa28642ef4 100644 --- a/doc/README.md +++ b/doc/README.md @@ -1,6 +1,14 @@ +## Developer Guide + +Gluster's contributors can check about the internals by visiting [Developer Guide Section](developer-guide). While it is not 'comprehensive', it can help you to get started. + +Also while coding, keep [Coding Standard](developer-guide/coding-standard.md) in mind. + +When you are ready to commit the changes, make sure you meet our [Commit message standard](developer-guide/commit-guidelines.md). + ## Admin Guide ## -The gluster administration guide is maintained at [github](https://github.com/gluster/glusterdocs). The browsable admin guide can be found [here](http://gluster.readthedocs.org/en/latest/Administrator%20Guide/README/). +The gluster administration guide is maintained at [github](https://github.com/gluster/glusterdocs). The browsable admin guide can be found [here](http://docs.gluster.org/en/latest/Administrator%20Guide/). The doc patch has to be sent against the above mentioned repository. @@ -10,7 +18,7 @@ The Gluster features which are 'in progress' or implemented can be found at [git ## Upgrade Guide ## -The gluster upgrade guide is maintained at [github](https://github.com/gluster/glusterdocs). The browsable upgrade guide can be found [here](http://gluster.readthedocs.org/en/latest/Upgrade-Guide/README/) +The gluster upgrade guide is maintained at [github](https://github.com/gluster/glusterdocs). The browsable upgrade guide can be found [here](http://docs.gluster.org/en/latest/Upgrade-Guide) The doc patch has to be sent against the above mentioned repository. diff --git a/doc/developer-guide/coredump-analysis.md b/doc/debugging/analyzing-regression-cores.md index 16fa9165fd0..5e10f41c6eb 100644 --- a/doc/developer-guide/coredump-analysis.md +++ b/doc/debugging/analyzing-regression-cores.md @@ -1,36 +1,35 @@ -This document explains how to analyze core-dumps obtained from regression -machines, with examples. -1) Download the core-tarball and extract it. -2) 'cd' into directory where the tarball is extracted. -~~~ -[root@atalur Downloads]# pwd -/home/atalur/Downloads -[root@atalur Downloads]# ls +# Analyzing Regression Cores +This document explains how to analyze core-dumps obtained from regression machines, with examples. +1. Download the core-tarball and extract it. +2. `cd` into directory where the tarball is extracted. +``` +[sh]# pwd +/home/user/Downloads +[sh]# ls build build-install-20150625_05_42_39.tar.bz2 lib64 usr -~~~ -3) Determine the core file you need to examine. There can be more than one core file. -You can list them from './build/install/cores' directory. -~~~ -[root@atalur Downloads]# ls build/install/cores/ +``` +3. Determine the core file you need to examine. There can be more than one core file. You can list them from './build/install/cores' directory. +``` +[sh]# ls build/install/cores/ core.9341 liblist.txt liblist.txt.tmp -~~~ +``` In case you are unsure which binary generated the core-file, executing 'file' command on it will help. -~~~ -[root@atalur Downloads]# file ./build/install/cores/core.9341 +``` +[sh]# file ./build/install/cores/core.9341 ./build/install/cores/core.9341: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/build/install/sbin/glusterfsd -s slave26.cloud.gluster.org --volfile-id patchy' -~~~ -As seen, the core file was generated by glusterfsd binary, and path to it is provided (/build/install/sbin/glusterfsd). -4) Now, run the following command on the core: -~~~ +``` +As seen, the core file was generated by glusterfsd binary, and path to it is provided (/build/install/sbin/glusterfsd). + +4. Now, run the following command on the core: +``` gdb -ex 'set sysroot ./' -ex 'core-file ./build/install/cores/core.xxx' <target, say ./build/install/sbin/glusterd> In this case, gdb -ex 'set sysroot ./' -ex 'core-file ./build/install/cores/core.9341' ./build/install/sbin/glusterfsd -~~~ -5) You can cross check if all shared libraries are available and loaded by using 'info sharedlibrary' command from -inside gdb. -6) Once verified, usual gdb commands based on requirement can be used to debug the core. -'bt' or 'backtrace' from gdb of core used in examples: -~~~ +``` +5. You can cross check if all shared libraries are available and loaded by using 'info sharedlibrary' command from inside gdb. +6. Once verified, usual gdb commands based on requirement can be used to debug the core. + `bt` or `backtrace` from gdb of core used in examples: +``` Core was generated by `/build/install/sbin/glusterfsd -s slave26.cloud.gluster.org --volfile-id patchy'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f512a54e625 in raise () from ./lib64/libc.so.6 @@ -52,4 +51,4 @@ Program terminated with signal SIGABRT, Aborted. #12 0x00007f512a55f8f0 in ?? () from ./lib64/libc.so.6 #13 0x0000000000000000 in ?? () (gdb) -~~~ +``` diff --git a/doc/debugging/coredump-analysis.md b/doc/debugging/coredump-analysis.md deleted file mode 100644 index f9ecf73216e..00000000000 --- a/doc/debugging/coredump-analysis.md +++ /dev/null @@ -1,31 +0,0 @@ -This document explains how to analyze core-dumps obtained from regression -machines, with examples. -1) Download the core-tarball and extract it. -2) 'cd' into the root of extracted tarball. -~~~ -[root@atalur Downloads]# pwd -/home/atalur/Downloads -[root@atalur Downloads]# ls -build build-install-20150625_05_42_39.tar.bz2 lib64 usr -~~~ -3) Determine the core file you need to examine. There can be more than one core file. -You can list them from './build/install/cores' directory. -~~~ -[root@atalur Downloads]# ls build/install/cores/ -core.9341 liblist.txt liblist.txt.tmp -~~~ -In case you are unsure which binary generated the core-file, executing 'file' command on it will help. -~~~ -[root@atalur Downloads]# file ./build/install/cores/core.9341 -./build/install/cores/core.9341: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/build/install/sbin/glusterfsd -s slave26.cloud.gluster.org --volfile-id patchy' -~~~ -As seen, the core file was generated by glusterfsd binary, and path to it is provide (/build/install/sbin/glusterfsd). -4) Now, run the following command on the core: -~~~ -gdb -ex 'set sysroot ./' -ex 'core-file ./build/install/cores/core.xxx' <target, say ./build/install/sbin/glusterd> -In this case, -gdb -ex 'set sysroot ./' -ex 'core-file ./build/install/cores/core.9341' ./build/install/sbin/glusterfsd -~~~ -5) You can cross check if all shared libraries are available and loaded by using 'info sharedlibrary' command from -inside gdb. -6) Once verified, usual gdb commands based on requirement can be used to debug the core. diff --git a/doc/debugging/gfid-to-path.md b/doc/debugging/gfid-to-path.md index 09c459e52c8..1917bf2cca1 100644 --- a/doc/debugging/gfid-to-path.md +++ b/doc/debugging/gfid-to-path.md @@ -1,37 +1,37 @@ -#Convert GFID to Path +# Convert GFID to Path GlusterFS internal file identifier (GFID) is a uuid that is unique to each file across the entire cluster. This is analogous to inode number in a normal filesystem. The GFID of a file is stored in its xattr named `trusted.gfid`. -####Special mount using [gfid-access translator][1]: -~~~ +#### Special mount using [gfid-access translator][1]: +``` mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol -~~~ +``` Assuming, you have `GFID` of a file from changelog (or somewhere else). For trying this out, you can get `GFID` of a file from mountpoint: -~~~ +``` getfattr -n glusterfs.gfid.string /mnt/testvol/dir/file -~~~ +``` --- -###Get file path from GFID (Method 1): +### Get file path from GFID (Method 1): **(Lists hardlinks delimited by `:`, returns path as seen from mountpoint)** -####Turn on build-pgfid option -~~~ +#### Turn on build-pgfid option +``` gluster volume set test build-pgfid on -~~~ +``` Read virtual xattr `glusterfs.ancestry.path` which contains the file path -~~~ +``` getfattr -n glusterfs.ancestry.path -e text /mnt/testvol/.gfid/<GFID> -~~~ +``` **Example:** -~~~ +``` [root@vm1 glusterfs]# ls -il /mnt/testvol/dir/ total 1 10610563327990022372 -rw-r--r--. 2 root root 3 Jul 17 18:05 file @@ -46,28 +46,23 @@ glusterfs.gfid.string="11118443-1894-4273-9340-4b212fa1c0e4" getfattr: Removing leading '/' from absolute path names # file: mnt/testvol/.gfid/11118443-1894-4273-9340-4b212fa1c0e4 glusterfs.ancestry.path="/dir/file:/dir/file3" -~~~ +``` --- -###Get file path from GFID (Method 2): +### Get file path from GFID (Method 2): **(Does not list all hardlinks, returns backend brick path)** -~~~ +``` getfattr -n trusted.glusterfs.pathinfo -e text /mnt/testvol/.gfid/<GFID> -~~~ +``` **Example:** -~~~ +``` [root@vm1 glusterfs]# getfattr -n trusted.glusterfs.pathinfo -e text /mnt/testvol/.gfid/11118443-1894-4273-9340-4b212fa1c0e4 getfattr: Removing leading '/' from absolute path names # file: mnt/testvol/.gfid/11118443-1894-4273-9340-4b212fa1c0e4 trusted.glusterfs.pathinfo="(<DISTRIBUTE:test-dht> <POSIX(/mnt/brick-test/b):vm1:/mnt/brick-test/b/dir//file3>)" -~~~ +``` --- -###Get file path from GFID (Method 3): -https://gist.github.com/semiosis/4392640 - ---- -####References and links: +#### References and links: [posix: placeholders for GFID to path conversion](http://review.gluster.org/5951) -[1]: https://github.com/gluster/glusterfs/blob/master/doc/features/gfid-access.md diff --git a/doc/debugging/mem-alloc-list.md b/doc/debugging/mem-alloc-list.md new file mode 100644 index 00000000000..1c68e65d323 --- /dev/null +++ b/doc/debugging/mem-alloc-list.md @@ -0,0 +1,19 @@ +## Viewing Memory Allocations + +While statedumps provide stats of the number of allocations, size etc for a +particular mem type, there is no easy way to examine all the allocated objects of that type +in memory.Being able to view this information could help with determining how an object is used, +and if there are any memory leaks. + +The mem_acct_rec structures have been updated to include lists to which the allocated object is +added. These can be examined in gdb using simple scripts. + +`gdb> plist xl->mem_acct.rec[$type]->obj_list` + +will print out the pointers of all allocations of $type. + +These changes are primarily targeted at developers and need to enabled +at compile-time using `configure --enable-debug`. + + + diff --git a/doc/debugging/split-brain.md b/doc/debugging/split-brain.md index b0d938e26bc..6b122c40551 100644 --- a/doc/debugging/split-brain.md +++ b/doc/debugging/split-brain.md @@ -1,33 +1,36 @@ -Steps to recover from File split-brain. -====================================== - -Quick Start: -============ -1. Get the path of the file that is in split-brain: -> It can be obtained either by -> a) The command `gluster volume heal info split-brain`. -> b) Identify the files for which file operations performed - from the client keep failing with Input/Output error. - -2. Close the applications that opened this file from the mount point. +# Steps to recover from File split-brain +This document contains steps to recover from a file split-brain. +## Quick Start: +### Step 1. Get the path of the file that is in split-brain: +It can be obtained either by +1. The command `gluster volume heal info split-brain`. +2. Identify the files for which file operations performed from the client keep failing with Input/Output error. + +### Step 2. Close the applications that opened this file from the mount point. In case of VMs, they need to be powered-off. -3. Decide on the correct copy: -> This is done by observing the afr changelog extended attributes of the file on +### Step 3. Decide on the correct copy: +This is done by observing the afr changelog extended attributes of the file on the bricks using the getfattr command; then identifying the type of split-brain (data split-brain, metadata split-brain, entry split-brain or split-brain due to gfid-mismatch); and finally determining which of the bricks contains the 'good copy' of the file. -> `getfattr -d -m . -e hex <file-path-on-brick>`. +``` +getfattr -d -m . -e hex <file-path-on-brick> +``` + It is also possible that one brick might contain the correct data while the other might contain the correct metadata. -4. Reset the relevant extended attribute on the brick(s) that contains the -'bad copy' of the file data/metadata using the setfattr command. -> `setfattr -n <attribute-name> -v <attribute-value> <file-path-on-brick>` +### Step 4. Reset the relevant extended attribute on the brick(s) that contains the 'bad copy' of the file data/metadata using the setfattr command. +``` +setfattr -n <attribute-name> -v <attribute-value> <file-path-on-brick> +``` -5. Trigger self-heal on the file by performing lookup from the client: -> `ls -l <file-path-on-gluster-mount>` +### Step 5. Trigger self-heal on the file by performing lookup from the client: +``` +ls -l <file-path-on-gluster-mount> +``` Detailed Instructions for steps 3 through 5: =========================================== @@ -36,13 +39,15 @@ afr changelog extended attributes. Execute `getfattr -d -m . -e hex <file-path-on-brick>` -* Example: +Example: +``` [root@store3 ~]# getfattr -d -e hex -m. brick-a/file.txt \#file: brick-a/file.txt security.selinux=0x726f6f743a6f626a6563745f723a66696c655f743a733000 trusted.afr.vol-client-2=0x000000000000000000000000 trusted.afr.vol-client-3=0x000000000200000000000000 trusted.gfid=0x307a5c9efddd4e7c96e94fd4bcdcbd1b +``` The extended attributes with `trusted.afr.<volname>-client-<subvolume-index>` are used by afr to maintain changelog of the file.The values of the @@ -51,10 +56,11 @@ client (fuse or nfs-server) processes. When the glusterfs client modifies a file or directory, the client contacts each brick and updates the changelog extended attribute according to the response of the brick. -'subvolume-index' is nothing but (brick number - 1) in +`subvolume-index` is nothing but (brick number - 1) in `gluster volume info <volname>` output. -* Example: +Example: +``` [root@pranithk-laptop ~]# gluster volume info vol Volume Name: vol Type: Distributed-Replicate @@ -71,6 +77,7 @@ attribute according to the response of the brick. brick-f: pranithk-laptop:/gfs/brick-f brick-g: pranithk-laptop:/gfs/brick-g brick-h: pranithk-laptop:/gfs/brick-h +``` In the example above: ``` @@ -91,12 +98,15 @@ present in all the other bricks in it's replica set as seen by that brick. In the example volume given above, all files in brick-a will have 2 entries, one for itself and the other for the file present in it's replica pair, i.e.brick-b: +``` trusted.afr.vol-client-0=0x000000000000000000000000 -->changelog for itself (brick-a) trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for brick-b as seen by brick-a - +``` Likewise, all files in brick-b will have: +``` trusted.afr.vol-client-0=0x000000000000000000000000 -->changelog for brick-a as seen by brick-b trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for itself (brick-b) +``` The same can be extended for other replica pairs. @@ -122,7 +132,8 @@ When a file split-brain happens it could be either data split-brain or meta-data split-brain or both. When a split-brain happens the changelog of the file would be something like this: -* Example:(Lets consider both data, metadata split-brain on same file). +Example:(Lets consider both data, metadata split-brain on same file). +``` [root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a getfattr: Removing leading '/' from absolute path names \#file: gfs/brick-a/a @@ -133,10 +144,11 @@ trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57 trusted.afr.vol-client-0=0x000003b00000000100000000 trusted.afr.vol-client-1=0x000000000000000000000000 trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57 +``` -###Observations: +### Observations: -####According to changelog extended attributes on file /gfs/brick-a/a: +#### According to changelog extended attributes on file /gfs/brick-a/a: The first 8 digits of trusted.afr.vol-client-0 are all zeros (0x00000000................), and the first 8 digits of trusted.afr.vol-client-1 are not all zeros (0x000003d7................). @@ -149,7 +161,7 @@ trusted.afr.vol-client-1 are not all zeros (0x........00000001........). So the changelog on /gfs/brick-a/a implies that some metadata operations succeeded on itself but failed on /gfs/brick-b/a. -####According to Changelog extended attributes on file /gfs/brick-b/a: +#### According to Changelog extended attributes on file /gfs/brick-b/a: The first 8 digits of trusted.afr.vol-client-0 are not all zeros (0x000003b0................), and the first 8 digits of trusted.afr.vol-client-1 are all zeros (0x00000000................). @@ -205,6 +217,7 @@ Hence execute `setfattr -n trusted.afr.vol-client-1 -v 0x000003d70000000000000000 /gfs/brick-a/a` Thus after the above operations are done, the changelogs look like this: +``` [root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a getfattr: Removing leading '/' from absolute path names \#file: gfs/brick-a/a @@ -216,7 +229,7 @@ trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57 trusted.afr.vol-client-0=0x000000000000000100000000 trusted.afr.vol-client-1=0x000000000000000000000000 trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57 - +``` Triggering Self-heal: --------------------- @@ -243,9 +256,9 @@ needs to be removed.The gfid-link files are present in the .glusterfs folder in the top-level directory of the brick. If the gfid of the file is 0x307a5c9efddd4e7c96e94fd4bcdcbd1b (the trusted.gfid extended attribute got from the getfattr command earlier),the gfid-link file can be found at -> /gfs/brick-a/.glusterfs/30/7a/307a5c9efddd4e7c96e94fd4bcdcbd1b +`/gfs/brick-a/.glusterfs/30/7a/307a5c9efddd4e7c96e94fd4bcdcbd1b` -####Word of caution: +#### Word of caution: Before deleting the gfid-link, we have to ensure that there are no hard links to the file present on that brick. If hard-links exist,they must be deleted as well. diff --git a/doc/debugging/statedump.md b/doc/debugging/statedump.md index f34a5c3436a..9dfdce15fad 100644 --- a/doc/debugging/statedump.md +++ b/doc/debugging/statedump.md @@ -1,31 +1,53 @@ -#Statedump +# Statedump Statedump is a file generated by glusterfs process with different data structure state which may contain the active inodes, fds, mempools, iobufs, memory allocation stats of different types of datastructures per xlator etc. -##How to generate statedump -We can find the directory where statedump files are created using 'gluster --print-statedumpdir' command. +## How to generate statedump +We can find the directory where statedump files are created using `gluster --print-statedumpdir` command. Create that directory if not already present based on the type of installation. Lets call this directory `statedump-directory`. -We can generate statedump using 'kill -USR1 <pid-of-gluster-process>'. +We can generate statedump using `kill -USR1 <pid-of-gluster-process>`. gluster-process is nothing but glusterd/glusterfs/glusterfsd process. There are also commands to generate statedumps for brick processes/nfs server/quotad -For bricks: `gluster volume statedump <volname>` +For bricks: +``` +gluster volume statedump <volname> +``` -For nfs server: `gluster volume statedump <volname> nfs` +For nfs server: +``` +gluster volume statedump <volname> nfs +``` -For quotad: `gluster volume statedump <volname> quotad` +For quotad: +``` +gluster volume statedump <volname> quotad +``` For brick-processes files will be created in `statedump-directory` with name of the file as `hyphenated-brick-path.<pid>.dump.timestamp`. For all other processes it will be `glusterdump.<pid>.dump.timestamp`. -##How to read statedump +For applications using libgfapi, `SIGUSR1` cannot be used, eg: smbd/libvirtd +processes could have used the `SIGUSR1` signal already for other purposes. +To generate statedump for the processes, using libgfapi, below command can be +executed from one of the nodes in the gluster cluster to which the libgfapi +application is connected to. +``` +gluster volume statedump <volname> client <hostname>:<process id> +``` +The statedumps can be found in the `statedump-directory`, the name of the +statedumps being `glusterdump.<pid>.dump.timestamp`. For a process there can be +multiple such files created depending on the number of times the volume is +accessed by the process (related to the number of `glfs_init()` calls). + +## How to read statedump We shall see snippets of each type of statedump. First and last lines of the file have starting and ending time of writing the statedump file. Times will be in UTC timezone. mallinfo return status is printed in the following format. Please read man mallinfo for more information about what each field means. -###Mallinfo +### Mallinfo ``` [mallinfo] mallinfo_arena=100020224 /* Non-mmapped space allocated (bytes) */ @@ -40,7 +62,7 @@ mallinfo_fordblks=3310112 /* Total free space (bytes) */ mallinfo_keepcost=133712 /* Top-most, releasable space (bytes) */ ``` -###Data structure allocation stats +### Data structure allocation stats For every xlator data structure memory per translator loaded in the call-graph is displayed in the following format: For xlator with name: glusterfs @@ -61,7 +83,7 @@ max_num_allocs=3 #Maximum number of active allocations at any point in the life total_allocs=7 #Number of times this data is allocated in the life of the process. ``` -###Mempools +### Mempools Mempools are optimization to reduce the number of allocations of a data type. If we create a mem-pool of lets say 1024 elements for a data-type, new elements will be allocated from heap using syscalls like calloc, only if all the 1024 elements in the pool are in active use. @@ -81,7 +103,7 @@ cur-stdalloc=0 #Denotes the number of allocations made from heap once cold-count max-stdalloc=0 #Maximum number of allocations from heap that are in active use at any point in the life of the process. ``` -###Iobufs +### Iobufs ``` [iobuf.global] iobuf_pool=0x1f0d970 #The memory pool for iobufs @@ -92,7 +114,7 @@ iobuf_pool.arena_cnt=8 #Total number of arenas in the pool iobuf_pool.request_misses=0 #The number of iobufs that were stdalloc'd (as they exceeded the default max page size provided by iobuf_pool). ``` -There are 3 lists of arenas +There are 3 lists of arenas: 1. Arena list: arenas allocated during iobuf pool creation and the arenas that are in use(active_cnt != 0) will be part of this list. 2. Purge list: arenas that can be purged(no active iobufs, active_cnt == 0). @@ -129,7 +151,7 @@ arena.6.active_iobuf.2.ptr=0x7fdb92189000 At any given point in time if there are lots of filled arenas then that could be a sign of iobuf leaks. -###Call stack +### Call stack All the fops received by gluster are handled using call-stacks. Call stack contains the information about uid/gid/pid etc of the process that is executing the fop. Each call-stack contains different call-frames per xlator which handles that fop. ``` @@ -144,7 +166,7 @@ op=LOOKUP #Fop type=1 #Type of the op i.e. FOP/MGMT-OP cnt=9 #Number of frames in this stack. ``` -###Call-frame +### Call-frame Each frame will have information about which xlator the frame belongs to, what is the function it wound to/from and will be unwind to. It also mentions if the unwind happened or not. If we observe hangs in the system and want to find out which xlator is causing it. Take a statedump and see what is the final xlator which is yet to be unwound. ``` @@ -159,7 +181,7 @@ wind_to=priv->children[i]->fops->lookup unwind_to=afr_lookup_cbk #Parent xlator function to which unwind happened ``` -###History of operations in Fuse +### History of operations in Fuse Fuse maintains history of operations that happened in fuse. @@ -175,7 +197,7 @@ TIME=2014-07-09 16:44:57.523394 message=[0] fuse_getattr_resume: 4591, STAT, path: (/iozone.tmp), gfid: (3afb4968-5100-478d-91e9-76264e634c9f) ``` -###Xlator configuration +### Xlator configuration ``` [cluster/replicate.r2-replicate-0] #Xlator type, name information child_count=2 #Number of children to the xlator @@ -195,7 +217,7 @@ favorite_child=-1 wait_count=1 ``` -###Graph/inode table +### Graph/inode table ``` [active graph - 1] @@ -207,7 +229,7 @@ conn.1.bound_xl./data/brick01a/homegfs.lru_size=183 #Number of inodes present in conn.1.bound_xl./data/brick01a/homegfs.purge_size=0 #Number of inodes present in purge list ``` -###Inode +### Inode ``` [conn.1.bound_xl./data/brick01a/homegfs.active.324] #324th inode in active inode list gfid=e6d337cf-97eb-44b3-9492-379ba3f6ad42 #Gfid of the inode @@ -215,6 +237,7 @@ nlookup=13 #Number of times lookups happened from the client or from fuse kernel fd-count=4 #Number of fds opened on the inode ref=11 #Number of refs taken on the inode ia_type=1 #Type of the inode. This should be changed to some string :-( +Ref by xl:.patchy-md-cache=11 #Further this there will be a list of xlators, and the ref count taken by each of them on this inode at the time of statedump [conn.1.bound_xl./data/brick01a/homegfs.lru.1] #1st inode in lru list. Note that ref count is zero for these inodes. gfid=5114574e-69bc-412b-9e52-f13ff087c6fc @@ -222,8 +245,10 @@ nlookup=5 fd-count=0 ref=0 ia_type=2 +Ref by xl:.fuse=1 +Ref by xl:.patchy-client-0=-1 ``` -###Inode context +### Inode context For each inode per xlator some context could be stored. This context can also be printed in the statedump. Here is the inode ctx of locks xlator ``` [xlator.features.locks.homegfs-locks.inode] @@ -240,12 +265,12 @@ lock-dump.domain.domain=homegfs-replicate-0 #Domain name where entry/data operat inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=11141120, len=131072, pid = 18446744073709551615, owner=080b1ada117f0000, client=0xb7fc30, connection-id=compute-30-029.com-3505-2014/06/29-14:46:12:477358-homegfs-client-0-0-1, granted at Sun Jun 29 11:10:36 2014 #Active lock information ``` -##FAQ -###How to debug Memory leaks using statedump? +## FAQ +### How to debug Memory leaks using statedump? -####Using memory accounting feature: +#### Using memory accounting feature: -`https://bugzilla.redhat.com/show_bug.cgi?id=1120151` is one of the bugs which was debugged using statedump to see which data-structure is leaking. Here is the process used to find what the leak is using statedump. According to the bug the observation is that the process memory usage is increasing whenever one of the bricks is wiped in a replicate volume and a `full` self-heal is invoked to heal the contents. Statedump of the process is taken using kill -USR1 `<pid-of-gluster-self-heal-daemon>`. +[Bug 1120151](https://bugzilla.redhat.com/show_bug.cgi?id=1120151) is one of the bugs which was debugged using statedump to see which data-structure is leaking. Here is the process used to find what the leak is using statedump. According to the bug the observation is that the process memory usage is increasing whenever one of the bricks is wiped in a replicate volume and a `full` self-heal is invoked to heal the contents. Statedump of the process is taken using `kill -USR1 <pid-of-gluster-self-heal-daemon>`. ``` grep -w num_allocs glusterdump.5225.dump.1405493251 num_allocs=77078 @@ -268,10 +293,10 @@ grep of the statedump revealed too many allocations for the following data-types 3. gf_common_mt_mem_pool. After checking afr-code for allocations with tag `gf_common_mt_char` found `data-self-heal` code path does not free one such allocated memory. `gf_common_mt_mem_pool` suggests that there is a leak in pool memory. `replicate-0:dict_t`, `glusterfs:data_t` and `glusterfs:data_pair_t` pools are using lot of memory, i.e. cold_count is `0` and too many allocations. Checking source code of dict.c revealed that `key` in `dict` is allocated with `gf_common_mt_char` i.e. `2.` tag and value is created using gf_asprintf which in-turn uses `gf_common_mt_asprintf` i.e. `1.`. Browsing the code for leak in self-heal code paths lead to a line which over-writes a variable with new dictionary even when it was already holding a reference to another dictionary. After fixing these leaks, ran the same test to verify that none of the `num_allocs` are increasing even after healing 10,000 files directory hierarchy in statedump of self-heal daemon. -Please check http://review.gluster.org/8316 for more info about patch/code. +Please check this [patch](http://review.gluster.org/8316) for more info about the fix. -####Debugging leaks in memory pools: -Statedump output of memory pools was used to test and verify the fixes to https://bugzilla.redhat.com/show_bug.cgi?id=1134221. On code analysis, dict_t objects were found to be leaking (in terms of not being unref'd enough number of times, during name self-heal. The test involved creating 100 files on plain replicate volume, removing them from one of the bricks's backend, and then triggering lookup on them from the mount point. Statedump of the mount process was taken before executing the test case and after it, after compiling glusterfs with -DDEBUG flags (to have cold count set to 0 by default). +#### Debugging leaks in memory pools: +Statedump output of memory pools was used to test and verify the fixes to [Bug 1134221](https://bugzilla.redhat.com/show_bug.cgi?id=1134221). On code analysis, dict_t objects were found to be leaking (in terms of not being unref'd enough number of times, during name self-heal. The test involved creating 100 files on plain replicate volume, removing them from one of the brick's backend, and then triggering lookup on them from the mount point. Statedump of the mount process was taken before executing the test case and after it, after compiling glusterfs with -DDEBUG flags (to have cold count set to 0 by default). Statedump output of the fuse mount process before the test case was executed: @@ -303,7 +328,7 @@ cur-stdalloc=214 max-stdalloc=220 ``` -Here, with cold count being 0 by default, cur-stdalloc indicated the number of dict_t objects that were allocated in heap using mem_get(), and yet to be freed using mem_put() (refer to https://github.com/gluster/glusterfs/blob/master/doc/data-structures/mem-pool.md for more details on how mempool works). After the test case (name selfheal of 100 files), there was a rise in the cur-stdalloc value (from 14 to 214) for dict_t. +Here, with cold count being 0 by default, `cur-stdalloc` indicated the number of `dict_t` objects that were allocated in heap using `mem_get()`, and yet to be freed using `mem_put()` (refer to this [page](../developer-guide/datastructure-mem-pool.md) for more details on how mempool works). After the test case (name selfheal of 100 files), there was a rise in the cur-stdalloc value (from 14 to 214) for `dict_t`. After these leaks were fixed, glusterfs was again compiled with -DDEBUG flags, and the same steps were performed again and statedump was taken before and after executing the test case, of the mount. This was done to ascertain the validity of the fix. And the following are the results: @@ -337,8 +362,8 @@ max-stdalloc=119 ``` The value of cur-stdalloc remained 14 before and after the test, indicating that the fix indeed does what it's supposed to do. -###How to debug hangs because of frame-loss? -`https://bugzilla.redhat.com/show_bug.cgi?id=994959` is one of the bugs where statedump was helpful in finding where the frame was lost. Here is the process used to find where the hang is using statedump. +### How to debug hangs because of frame-loss? +[Bug 994959](https://bugzilla.redhat.com/show_bug.cgi?id=994959) is one of the bugs where statedump was helpful in finding where the frame was lost. Here is the process used to find where the hang is using statedump. When the hang was observed, statedumps are taken for all the processes. On mount's statedump the following stack is shown: ``` [global.callpool.stack.1.frame.1] @@ -386,4 +411,4 @@ unwind_to=qr_readdirp_cbk ``` `unwind_to` shows that call was unwound to `afr_readdirp_cbk` from client xlator. Inspecting that function revealed that afr is not unwinding the stack when fop failed. -Check http://review.gluster.org/5531 for more info about patch/code changes. +Check this [patch](http://review.gluster.org/5531) for more info about the fix. diff --git a/doc/developer-guide/Language-Bindings.md b/doc/developer-guide/Language-Bindings.md index 89ef6df3d78..951f5fae2f6 100644 --- a/doc/developer-guide/Language-Bindings.md +++ b/doc/developer-guide/Language-Bindings.md @@ -1,10 +1,11 @@ +# Language Bindings GlusterFS 3.4 introduced the libgfapi client API for C programs. This page lists bindings to the libgfapi C library from other languages. Go -- -- [gogfapi](https://forge.gluster.org/gogfapi) - Go language bindings +- [gogfapi](https://github.com/gluster/gogfapi) - Go language bindings for libgfapi, aiming to provide an api consistent with the default Go file apis. @@ -37,3 +38,8 @@ Rust - [gfapi-sys](https://github.com/cholcombe973/Gfapi-sys) - Libgfapi bindings for Rust using FFI +Perl +---- + +- [libgfapi-perl](https://github.com/gluster/libgfapi-perl) - Libgfapi + bindings for Perl using FFI diff --git a/doc/developer-guide/Developers-Index.md b/doc/developer-guide/README.md index 9bcbcdc4cbe..aaf9c7476b0 100644 --- a/doc/developer-guide/Developers-Index.md +++ b/doc/developer-guide/README.md @@ -18,11 +18,9 @@ code check-in. the GPL v2 and the LGPL v3 or later - [GlusterFS Coding Standards](./coding-standard.md) -Developing ----------- +- If you are not sure of where to start, and what to do, we have a small + write-up on what you can pick. [Check it out](./options-to-contribute.md) -- [Language Bindings](./Language Bindings.md) - Connect to - GlusterFS using various language bindings Adding File operations ---------------------- @@ -53,20 +51,30 @@ Daemon Management Framework Translators ----------- -- [Block Device Tanslator](./bd-xlator.md) - [Performance/write-Behind Translator](./write-behind.md) - [Translator Development](./translator-development.md) - [Storage/posix Translator](./posix.md) -- [Compression translator](./network_compression.md) + + +Brick multiplex +--------------- + +- [Brick mux resource reduction](./brickmux-thread-reduction.md) + +Fuse +---- + +- [Interrupt handling](./fuse-interrupt.md) Testing/Debugging ----------------- - [Unit Tests in GlusterFS](./unittest.md) - [Using the Gluster Test - Framework](./Using Gluster Test Framework.md) - Step by + Framework](./Using-Gluster-Test-Framework.md) - Step by step instructions for running the Gluster Test Framework -- [Coredump Analysis](./coredump-analysis.md) - Steps to analize coredumps generated by regression machines. +- [Coredump Analysis](../debugging/analyzing-regression-cores.md) - Steps to analize coredumps generated by regression machines. +- [Identifying Resource Leaks](./identifying-resource-leaks.md) Release Process --------------- diff --git a/doc/developer-guide/Using-Gluster-Test-Framework.md b/doc/developer-guide/Using-Gluster-Test-Framework.md index 96fa9247e84..d2bb1c391da 100644 --- a/doc/developer-guide/Using-Gluster-Test-Framework.md +++ b/doc/developer-guide/Using-Gluster-Test-Framework.md @@ -1,3 +1,4 @@ +# USing Gluster Test Framwork Description ----------- diff --git a/doc/developer-guide/afr-locks-evolution.md b/doc/developer-guide/afr-locks-evolution.md index 7d2a136d871..2dabbcfeb13 100644 --- a/doc/developer-guide/afr-locks-evolution.md +++ b/doc/developer-guide/afr-locks-evolution.md @@ -32,10 +32,10 @@ AFR makes use of locks xlator extensively: * For Entry self-heal, it is `entrylk(NULL name, parent inode)`. Specifying NULL for the name takes full lock on the directory referred to by the inode. * For data self-heal, there is a bit of history as to how locks evolved: -###Initial version (say version 1) : +### Initial version (say version 1) : There was no concept of selfheal daemon (shd). Only client lookups triggered heals. so AFR always took `inodelk(0,0,DATA_DOMAIN)` for healing. The issue with this approach was that when heal was in progress, I/O from clients was blocked . -###version 2: +### version 2: shd was introduced. We needed to allow I/O to go through when heal was going,provided the ranges did not overlap. To that extent, the following approach was adopted: + 1.shd takes (full inodelk in DATA_DOMAIN). Thus client FOPS are blocked and cannot modify changelog-xattrs @@ -79,7 +79,7 @@ It modifies data but the FOP succeeds only on brick 2. writev returns success, a and thus goes ahead and copies stale 128Kb from brick 1 to brick2. Thus as far as application is concerned, `writev` returned success but bricks have stale data. What needs to be done is `writev` must return success only if it succeeded on atleast one source brick (brick b1 in this case). Otherwise The heal still happens in reverse direction but as far as the application is concerned, it received an error. -###Note on lock **domains** +### Note on lock **domains** We have used conceptual names in this document like DATA_DOMAIN/ METADATA_DOMAIN/ SELF_HEAL_DOMAIN. In the code, these are mapped to strings that are based on the AFR xlator name like so: DATA_DOMAIN --->"vol_name-replicate-n" diff --git a/doc/developer-guide/afr-self-heal-daemon.md b/doc/developer-guide/afr-self-heal-daemon.md index b85ddd1c856..65940d420b7 100644 --- a/doc/developer-guide/afr-self-heal-daemon.md +++ b/doc/developer-guide/afr-self-heal-daemon.md @@ -39,7 +39,7 @@ When a client (mount) performs an operation on the file, the index xlator presen and removes it in post-op phase if the operation is successful. Thus if an entry is present inside the .glusterfs/indices/xattrop/ directory when there is no I/O happening on the file, it means the file needs healing (or atleast an examination if the brick crashed after the post-op completed but just before the removal of the hardlink). -####Index heal steps: +#### Index heal steps: <pre><code> In shd process of *each node* { opendir +readdir (.glusterfs/indices/xattrop/) diff --git a/doc/developer-guide/bd-xlator.md b/doc/developer-guide/bd-xlator.md deleted file mode 100644 index 1771fb6e24b..00000000000 --- a/doc/developer-guide/bd-xlator.md +++ /dev/null @@ -1,469 +0,0 @@ -#Block device translator - -Block device translator (BD xlator) is a translator added to GlusterFS which provides block backend for GlusterFS. This replaces the existing bd_map translator in GlusterFS that provided similar but very limited functionality. GlusterFS expects the underlying brick to be formatted with a POSIX compatible file system. BD xlator changes that and allows for having bricks that are raw block devices like LVM which needn’t have any file systems on them. Hence with BD xlator, it becomes possible to build a GlusterFS volume comprising of bricks that are logical volumes (LV). - -##bd - -BD xlator maps underlying LVs to files and hence the LVs appear as files to GlusterFS clients. Though BD volume externally appears very similar to the usual Posix volume, not all operations are supported or possible for the files on a BD volume. Only those operations that make sense for a block device are supported and the exact semantics are described in subsequent sections. - -While Posix volume takes a file system directory as brick, BD volume needs a volume group (VG) as brick. In the usual use case of BD volume, a file created on BD volume will result in an LV being created in the brick VG. In addition to a VG, BD volume also needs a file system directory that should be specified at the volume creation time. This directory is necessary for supporting the notion of directories and directory hierarchy for the BD volume. Metadata about LVs (size, mapping info) is stored in this directory. - -BD xlator was mainly developed to use block devices directly as VM images when GlusterFS is used as storage for KVM virtualization. Some of the salient points of BD xlator are - -* Since BD supports file level snapshots and clones by leveraging the snapshot and clone capabilities of LVM, it can be used to fully off-load snapshot and cloning operations from QEMU to the storage (GlusterFS) itself. - -* BD understands dm-thin LVs and hence can support files that are backed by thinly provisioned LVs. This capability of BD xlator translates to having thinly provisioned raw VM images. - -* BD enables thin LVs from a thin pool to be used from multiple nodes that have visibility to GlusterFS BD volume. Thus thin pool can be used as a VM image repository allowing access/visibility to it from multiple nodes. - -* BD supports true zerofill by using BLKZEROOUT ioctl on underlying block devices. Thus BD allows SCSI WRITESAME to be used on underlying block device if the device supports it. - -Though BD xlator is primarily intended to be used with block devices, it does provide full Posix xlator compatibility for files that are created on BD volume but are not backed by or mapped to a block device. Such files which don’t have a block device mapping exist on the Posix directory that is specified during BD volume creation. BD xlator is available from GlusterFS-3.5 release. - -###Compiling BD translator - -BD xlator needs lvm2 development library. –enable-bd-xlator option can be used with `./configure` script to explicitly enable BD translator. The following snippet from the output of configure script shows that BD xlator is enabled for compilation. - - -#####GlusterFS configure summary - - … - Block Device xlator : yes - - -###Creating a BD volume - -BD supports hosting of both linear LV and thin LV within the same volume. However seperate examples are provided below. As noted above, the prerequisite for a BD volume is VG which is created from a loop device here, but it can be any other device too. - - -* Creating BD volume with linear LV backend - -* Create a loop device - - - [root@node ~]# dd if=/dev/zero of=bd-loop count=1024 bs=1M - - [root@node ~]# losetup /dev/loop0 bd-loop - - -* Prepare a brick by creating a VG - - [root@node ~]# pvcreate /dev/loop0 - - [root@node ~]# vgcreate bd-vg /dev/loop0 - - -* Create the BD volume - -* Create a POSIX directory first - - - [root@node ~]# mkdir /bd-meta - -It is recommended that this directory is created on an LV in the brick VG itself so that both data and metadata live together on the same device. - - -* Create and mount the volume - - [root@node ~]# gluster volume create bd node:/bd-meta?bd-vg force - - -The general syntax for specifying the brick is `host:/posix-dir?volume-group-name` where “?” is the separator. - - - - [root@node ~]# gluster volume start bd - [root@node ~]# gluster volume info bd - Volume Name: bd - Type: Distribute - Volume ID: cb042d2a-f435-4669-b886-55f5927a4d7f - Status: Started - Xlator 1: BD - Capability 1: offload_copy - Capability 2: offload_snapshot - Number of Bricks: 1 - Transport-type: tcp - Bricks: - Brick1: node:/bd-meta - Brick1 VG: bd-vg - - - - [root@node ~]# mount -t glusterfs node:/bd /mnt - -* Create a file that is backed by an LV - - [root@node ~]# ls /mnt - - [root@node ~]# - -Since the volume is empty now, so is the underlying VG. - - [root@node ~]# lvdisplay bd-vg - [root@node ~]# - -Creating a file that is mapped to an LV is a 2 step operation. First the file should be created on the mount point and a specific extended attribute should be set to map the file to LV. - - [root@node ~]# touch /mnt/lv - [root@node ~]# setfattr -n “user.glusterfs.bd” -v “lv” /mnt/lv - -Now an LV got created in the VG brick and the file /mnt/lv maps to this LV. Any read/write to this file ends up as read/write to the underlying LV. - - [root@node ~]# lvdisplay bd-vg - — Logical volume — - LV Path /dev/bd-vg/6ff0f25f-2776-4d19-adfb-df1a3cab8287 - LV Name 6ff0f25f-2776-4d19-adfb-df1a3cab8287 - VG Name bd-vg - LV UUID PjMPcc-RkD5-RADz-6ixG-UYsk-oclz-vL0nv6 - LV Write Access read/write - LV Creation host, time node, 2013-11-26 16:15:45 +0530 - LV Status available - open 0 - LV Size 4.00 MiB - Current LE 1 - Segments 1 - Allocation inherit - Read ahead sectors 0 - Block device 253:6 - -The file gets created with default LV size which is 1 LE which is 4MB in this case. - - [root@node ~]# ls -lh /mnt/lv - -rw-r–r–. 1 root root 4.0M Nov 26 16:15 /mnt/lv - -truncate can be used to set the required file size. - - [root@node ~]# truncate /mnt/lv -s 256M - [root@node ~]# lvdisplay bd-vg - — Logical volume — - LV Path /dev/bd-vg/6ff0f25f-2776-4d19-adfb-df1a3cab8287 - LV Name 6ff0f25f-2776-4d19-adfb-df1a3cab8287 - VG Name bd-vg - LV UUID PjMPcc-RkD5-RADz-6ixG-UYsk-oclz-vL0nv6 - LV Write Access read/write - LV Creation host, time node, 2013-11-26 16:15:45 +0530 - LV Status available - # open 0 - LV Size 256.00 MiB - Current LE 64 - Segments 1 - Allocation inherit - Read ahead sectors 0 - Block device 253:6 - - - [root@node ~]# ls -lh /mnt/lv - -rw-r–r–. 1 root root 256M Nov 26 16:15 /mnt/lv - - currently LV size has been set to 256 - -The size of the file/LV can be specified during creation/mapping time itself like this: - - setfattr -n “user.glusterfs.bd” -v “lv:256MB” /mnt/lv - -2. Creating BD volume with thin LV backend - -* Create a loop device - - - [root@node ~]# dd if=/dev/zero of=bd-loop-thin count=1024 bs=1M - - [root@node ~]# losetup /dev/loop0 bd-loop-thin - - -* Prepare a brick by creating a VG and thin pool - - - [root@node ~]# pvcreate /dev/loop0 - - [root@node ~]# vgcreate bd-vg-thin /dev/loop0 - - -* Create a thin pool - - - [root@node ~]# lvcreate –thin bd-vg-thin -L 1000M - - Rounding up size to full physical extent 4.00 MiB - Logical volume “lvol0″ created - -lvdisplay shows the thin pool - - [root@node ~]# lvdisplay bd-vg-thin - — Logical volume — - LV Name lvol0 - VG Name bd-vg-thin - LV UUID HVa3EM-IVMS-QG2g-oqU6-1UxC-RgqS-g8zhVn - LV Write Access read/write - LV Creation host, time node, 2013-11-26 16:39:06 +0530 - LV Pool transaction ID 0 - LV Pool metadata lvol0_tmeta - LV Pool data lvol0_tdata - LV Pool chunk size 64.00 KiB - LV Zero new blocks yes - LV Status available - # open 0 - LV Size 1000.00 MiB - Allocated pool data 0.00% - Allocated metadata 0.88% - Current LE 250 - Segments 1 - Allocation inherit - Read ahead sectors auto - Block device 253:9 - -* Create the BD volume - -* Create a POSIX directory first - - - [root@node ~]# mkdir /bd-meta-thin - -* Create and mount the volume - - [root@node ~]# gluster volume create bd-thin node:/bd-meta-thin?bd-vg-thin force - - [root@node ~]# gluster volume start bd-thin - - - [root@node ~]# gluster volume info bd-thin - Volume Name: bd-thin - Type: Distribute - Volume ID: 27aa7eb0-4ffa-497e-b639-7cbda0128793 - Status: Started - Xlator 1: BD - Capability 1: thin - Capability 2: offload_copy - Capability 3: offload_snapshot - Number of Bricks: 1 - Transport-type: tcp - Bricks: - Brick1: node:/bd-meta-thin - Brick1 VG: bd-vg-thin - - - [root@node ~]# mount -t glusterfs node:/bd-thin /mnt - -* Create a file that is backed by a thin LV - - - [root@node ~]# ls /mnt - - [root@node ~]# - -Creating a file that is mapped to a thin LV is a 2 step operation. First the file should be created on the mount point and a specific extended attribute should be set to map the file to a thin LV. - - [root@node ~]# touch /mnt/thin-lv - - [root@node ~]# setfattr -n “user.glusterfs.bd” -v “thin:256MB” /mnt/thin-lv - -Now /mnt/thin-lv is a thin provisioned file that is backed by a thin LV and size has been set to 256. - - [root@node ~]# lvdisplay bd-vg-thin - — Logical volume — - LV Name lvol0 - VG Name bd-vg-thin - LV UUID HVa3EM-IVMS-QG2g-oqU6-1UxC-RgqS-g8zhVn - LV Write Access read/write - LV Creation host, time node, 2013-11-26 16:39:06 +0530 - LV Pool transaction ID 1 - LV Pool metadata lvol0_tmeta - LV Pool data lvol0_tdata - LV Pool chunk size 64.00 KiB - LV Zero new blocks yes - LV Status available - # open 0 - LV Size 000.00 MiB - Allocated pool data 0.00% - Allocated metadata 0.98% - Current LE 250 - Segments 1 - Allocation inherit - Read ahead sectors auto - Block device 253:9 - - - - - — Logical volume — - LV Path dev/bd-vg-thin/081b01d1-1436-4306-9baf-41c7bf5a2c73 - LV Name 081b01d1-1436-4306-9baf-41c7bf5a2c73 - VG Name bd-vg-thin - LV UUID coxpTY-2UZl-9293-8H2X-eAZn-wSp6-csZIeB - LV Write Access read/write - LV Creation host, time node, 2013-11-26 16:43:19 +0530 - LV Pool name lvol0 - LV Status available - # open 0 - LV Size 256.00 MiB - Mapped size 0.00% - Current LE 64 - Segments 1 - Allocation inherit - Read ahead sectors auto - Block device 253:10 - - - - - -As can be seen from above, creation of a file resulted in creation of a thin LV in the brick. - - -###Improvisation on BD translator: - -First version of BD xlator ( block backend) had few limitations such as - -* Creation of directories not supported -* Supports only single brick -* Does not use extended attributes (and client gfid) like posix xlator -* Creation of special files (symbolic links, device nodes etc) not - supported - -Basic limitation of not allowing directory creation was blocking -oVirt/VDSM to consume BD xlator as part of Gluster domain since VDSM -creates multi-level directories when GlusterFS is used as storage -backend for storing VM images. - -To overcome these limitations a new BD xlator with following -improvements are implemented. - -* New hybrid BD xlator that handles both regular files and block device - files -* The volume will have both POSIX and BD bricks. Regular files are - created on POSIX bricks, block devices are created on the BD brick (VG) -* BD xlator leverages exiting POSIX xlator for most POSIX calls and - hence sits above the POSIX xlator -* Block device file is differentiated from regular file by an extended - attribute -* The xattr 'user.glusterfs.bd' (BD_XATTR) plays a role in mapping a - posix file to Logical Volume (LV). -* When a client sends a request to set BD_XATTR on a posix file, a new - LV is created and mapped to posix file. So every block device will - have a representative file in POSIX brick with 'user.glusterfs.bd' - (BD_XATTR) set. -* Here after all operations on this file results in LV related - operations. - -For example, opening a file that has BD_XATTR set results in opening -the LV block device, reading results in reading the corresponding LV -block device. - -When BD xlator gets request to set BD_XATTR via setxattr call, it -creates a LV and information about this LV is placed in the xattr of the -posix file. xattr "user.glusterfs.bd" used to identify that posix file -is mapped to BD. - -Usage: -Server side: - - [root@host1 ~]# gluster volume create bdvol host1:/storage/vg1_info?vg1 host2:/storage/vg2_info?vg2 - -It creates a distributed gluster volume 'bdvol' with Volume Group vg1 -using posix brick /storage/vg1_info in host1 and Volume Group vg2 using -/storage/vg2_info in host2. - - - [root@host1 ~]# gluster volume start bdvol - -Client side: - - [root@node ~]# mount -t glusterfs host1:/bdvol /media - [root@node ~]# touch /media/posix - -It creates regular posix file 'posix' in either host1:/vg1 or host2:/vg2 brick - - [root@node ~]# mkdir /media/image - - [root@node ~]# touch /media/image/lv1 - - -It also creates regular posix file 'lv1' in either host1:/vg1 or -host2:/vg2 brick - - [root@node ~]# setfattr -n "user.glusterfs.bd" -v "lv" /media/image/lv1 - - [root@node ~]# - - -Above setxattr results in creating a new LV in corresponding brick's VG -and it sets 'user.glusterfs.bd' with value 'lv:<default-extent-size'' - - - [root@node ~]# truncate -s5G /media/image/lv1 - - -It results in resizig LV 'lv1'to 5G - -New BD xlator code is placed in `xlators/storage/bd` directory. - -Also add volume-uuid to the VG so that same VG cannot be used for other -bricks/volumes. After deleting a gluster volume, one has to manually -remove the associated tag using vgchange <vg-name> --deltag -`<trusted.glusterfs.volume-id:<volume-id>>` - - -#### Exposing volume capabilities - -With multiple storage translators (posix and bd) being supported in GlusterFS, it becomes -necessary to know the volume type so that user can issue appropriate calls that are relevant -only to the a given volume type. Hence there needs to be a way to expose the type of -the storage translator of the volume to the user. - -BD xlator is capable of providing server offloaded file copy, server/storage offloaded -zeroing of a file etc. This capabilities should be visible to the client/user, so that these -features can be exploited. - -BD xlator exports capability information through gluster volume info (and --xml) output. For eg: - -`snip of gluster volume info output for a BD based volume` - - Xlator 1: BD - Capability 1: thin - -`snip of gluster volume info --xml output for a BD based volume` - - <xlators> - <xlator> - <name>BD</name> - <capabilities> - <capability>thin</capability> - </capabilities> - </xlator> - </xlators> - -But this capability information should also exposed through some other means so that a host -which is not part of Gluster peer could also avail this capabilities. - -* Type - -BD translator supports both regular files and block device, i,e., one can create files on -GlusterFS volume backed by BD translator and this file could end up as regular posix file or -a logical volume (block device) based on the user''s choice. User can do a setxattr on the -created file to convert it to a logical volume. - -Users of BD backed volume like QEMU would like to know that it is working with BD type of volume -so that it can issue an additional setxattr call after creating a VM image on GlusterFS backend. -This is necessary to ensure that the created VM image is backed by LV instead of file. - -There are different ways to expose this information (BD type of volume) to user. -One way is to export it via a `getxattr` call. That said, When a client issues getxattr("volume_type") -on a root gfid, bd xlator will return 1 implying its BD xlator. But posix xlator will return ENODATA -and client code can interpret this as posix xlator. Also capability list can be returned via -getxattr("caps") for root gfid. - -* Capabilities - -BD xlator supports new features such as server offloaded file copy, thin provisioned VM images etc. - -There is no standard way of exploiting these features from client side (such as syscall -to exploit server offloaded copy). So these features need to be exported to the client so that -they can be used. BD xlator latest version exports these capabilities information through -gluster volume info (and --xml) output. But if a client is not part of GlusterFS peer -it can''t run volume info command to get the list of capabilities of a given GlusterFS volume. -For example, GlusterFS block driver in qemu need to get the capability list so that these features are used. - - - -Parts of this documentation were originally published here -#http://raobharata.wordpress.com/2013/11/27/glusterfs-block-device-translator/ diff --git a/doc/developer-guide/brickmux-thread-reduction.md b/doc/developer-guide/brickmux-thread-reduction.md new file mode 100644 index 00000000000..7d76e8ff579 --- /dev/null +++ b/doc/developer-guide/brickmux-thread-reduction.md @@ -0,0 +1,64 @@ +# Resource usage reduction in brick multiplexing + +Each brick is regresented with a graph of translators in a brick process. +Each translator in the graph has its own set of threads and mem pools +and other system resources allocations. Most of the times all these +resources are not put to full use. Reducing the resource consumption +of each brick is a problem in itself that needs to be addressed. The other +aspect to it is, sharing of resources across brick graph, this becomes +critical in brick multiplexing scenario. In this document we will be discussing +only about the threads. + +If a brick mux process hosts 50 bricks there are atleast 600+ threads created +in that process. Some of these are global threads that are shared by all the +brick graphs, and others are per translator threads. The global threads like +synctask threads, timer threads, sigwaiter, poller etc. are configurable and +do not needs to be reduced. The per translator threads keeps growing as the +number of bricks in the process increases. Each brick spawns atleast 10+ +threads: +- io-threads +- posix threads: + 1. Janitor + 2. Fsyncer + 3. Helper + 4. aio-thread +- changelog and bitrot threads(even when the features are not enabled) + +## io-threads + +io-threads should be made global to the process, having 16+ threads for +each brick does not make sense. But io-thread translator is loaded in +the graph, and the position of io-thread translator decides from when +the fops will be parallelised across threads. We cannot entirely move +the io-threads to libglusterfs and say the multiplexing happens from +the master translator or so. Hence, the io-thread orchestrator code +is moved to libglusterfs, which ensures there is only one set of +io-threads that is shared among the io-threads translator in each brick. +This poses performance issues due to lock-contention in the io-threds +layer. This also shall be addressed by having multiple locks instead of +one global lock for io-threads. + +## Posix threads +Most of the posix threads execute tasks in a timely manner, hence it can be +replaced with a timer whose handler register a task to synctask framework, once +the task is complete, the timer is registered again. With this we can eliminate +the need of one thread for each task. The problem with using synctasks is +the performance impact it will have due to make/swapcontext. For task that +does not involve network wait, we need not do makecontext, instead the task +function with arg can be stored and executed when a synctask thread is free. +We need to implement an api in synctask to execute atomic tasks(no network wait) +without the overhead of make/swapcontext. This will solve the performance +impact associated with using synctask framework. + +And the other challenge, is to cancel all the tasks pending from a translator. +This is important to cleanly detach brick. For this, we need to implement an +api in synctask that can cancel all the tasks from a given translator. + +For future, this will be replced to use global thread-pool(once implemented). + +## Changelog and bitrot threads + +In the initial implementation, the threads are not created if the feature is +not enabled. We need to share threads across changelog instances if we plan +to enable these features in brick mux scenario. + diff --git a/doc/developer-guide/coding-standard.md b/doc/developer-guide/coding-standard.md index 368c5553464..031c6c0da99 100644 --- a/doc/developer-guide/coding-standard.md +++ b/doc/developer-guide/coding-standard.md @@ -1,11 +1,38 @@ GlusterFS Coding Standards ========================== +Before you get started +---------------------- +Before starting with other part of coding standard, install `clang-format` + +On Fedora: +``` +$ dnf install clang +``` +On debian/Ubuntu: +``` +$ apt-get install clang +``` +Once you are done with all the local changes, you need to run below set of commands, +before submitting the patch for review. +``` +$ git add $file # if any +$ git commit -a -s -m "commit message" +$ git show --pretty="format:" --name-only | grep -v "contrib/" | egrep "*\.[ch]$" | xargs clang-format -i +$ git diff # see if there are any changes +$ git commit -a --amend # get the format changes done +$ ./submit-for-review.sh +``` + + Structure definitions should have a comment per member ------------------------------------------------------ -Every member in a structure definition must have a comment about its -purpose. The comment should be descriptive without being overly verbose. +Every member in a structure definition must have a comment about its purpose. +The comment should be descriptive without being overly verbose. For pointer +members, lifecycle concerns for the pointed-to object should be noted. For lock +members, the relationship between the lock member and the other members it +protects should be explicit. *Bad:* @@ -23,59 +50,182 @@ DBTYPE access_mode; /* access mode for accessing */ ``` -Declare all variables at the beginning of the function ------------------------------------------------------- +Structure members should be aligned based on the padding requirements +--------------------------------------------------------------------- -All local variables in a function must be declared immediately after the -opening brace. This makes it easy to keep track of memory that needs to be freed -during exit. It also helps debugging, since gdb cannot handle variables -declared inside loops or other such blocks. +The compiler will make sure that structure members have optimum alignment, +but at the expense of suboptimal padding. More important is to optimize the +padding. The compiler won't do that for you. -Always initialize local variables ---------------------------------- +This also will help utilize the memory better -Every local variable should be initialized to a sensible default value -at the point of its declaration. All pointers should be initialized to NULL, -and all integers should be zero or (if it makes sense) an error value. +*Bad:* +``` +struct bad { + bool b; /* 0 */ + /* 1..7 pad */ + void *p; /* 8..15 */ + char c; /* 16 */ + char a[16]; /* 17..33 */ + /* 34..39 pad */ + int64_t ii; /* 40..47 */ + int32_t i; /* 48..51 */ + /* 52..55 pad */ + int64_t iii; /* 56..63 */ +}; +``` +*Good:* +``` +struct good { + int64_t ii; /* explicit 64-bit types */ + void *p; /* may be 64- or 32-bit */ + long l; /* may be 64- or 32-bit */ + int i; /* 32-bit */ + short s; /* 16-bit */ + char c; /* 8-bit */ + bool b; /* 8-bit */ + char a[1024]; +); +``` +Make sure the items with the most stringent alignment requirements will need +to come earliest (ie, pointers and perhaps uint64_t etc), and those with less +stringent alignment requirements at the end (uint16/uint8 and char). Also note +that the long array (if any) should be at the end of the structure, regardless +of the type. + +Also note, if your structure's overall size is crossing 1k-4k limit, it is +recommended to mention the reason why the particular structure needs so much +memory as a comment at the top. + +Use \_typename for struct tags and typename\_t for typedefs +--------------------------------------------------------- + +Being consistent here makes it possible to automate navigation from use of a +type to its true definition (not just the typedef). + +*Bad:* + +``` +struct thing {...}; +struct thing_t {...}; +typedef struct _thing thing; +``` *Good:* ``` -int ret = 0; -char *databuf = NULL; -int _fd = -1; +typedef struct _thing {...} thing_t; ``` -Initialization should always be done with a constant value ----------------------------------------------------------- +No double underscores +--------------------- + +Identifiers beginning with double underscores are supposed to reserved for the +compiler. -Never use a non-constant expression as the initialization value for a variable. +http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf +When you need to define inner/outer functions, use a different prefix/suffix. *Bad:* ``` +void __do_something (void); + +void +do_something (void) +{ + LOCK (); + __do_something (); + UNLOCK (); +} +``` + +*Good:* + +``` +void do_something_locked (void); +``` + +Only use safe pointers in initializers +---------------------------------------------------------- + +Some pointers, such as `this` in a fop function, can be assumed to be non-NULL. +However, other parameters and further-derived values might be NULL. + +*Good:* + +``` pid_t pid = frame->root->pid; -char *databuf = malloc (1024); ``` + +*Bad:* + +``` +data_t *my_data = dict_get (xdata, "fubar"); +``` + +No giant stack allocations +-------------------------- + +Synctasks have small finite stacks. To avoid overflowing these stacks, avoid +allocating any large data structures on the stack. Use dynamic allocation +instead. + +*Bad:* + +``` +gf_boolean_t port_inuse[65536]; /* 256KB, this actually happened */ +``` + +NOTE: Ideal is to limit the stack array to less than 256 bytes. + + +Character array initializing +---------------------------- + +It is recommended to keep the character array initializing to empty string. + +*Good:* +``` +char msg[1024] = ""; +``` + +Not so much recommended, even though it means the same. + +``` +char msg[1024] = {0,}; +``` + +We recommend above to structure initialization. + + + Validate all arguments to a function ------------------------------------ All pointer arguments to a function must be checked for `NULL`. -A macro named `VALIDATE` (in `common-utils.h`) -takes one argument, and if it is `NULL`, writes a log message and -jumps to a label called `err` after setting op_ret and op_errno -appropriately. It is recommended to use this template. +A macro named `GF_VALIDATE_OR_GOTO` (in `common-utils.h`) +takes two arguments; if the first is `NULL`, it writes a log message and +jumps to a label specified by the second aergument after setting errno +appropriately. There are several variants of this function for more +specific purposes, and their use is recommended. + +*Bad:* +``` +/* top of function */ +ret = dict_get (xdata, ...) +``` *Good:* ``` -VALIDATE(frame); -VALIDATE(this); -VALIDATE(inode); +/* top of function */ +GF_VALIDATE_OR_GOTO(xdata,out); +ret = dict_get (xdata, ...) ``` Never rely on precedence of operators @@ -83,25 +233,34 @@ Never rely on precedence of operators Never write code that relies on the precedence of operators to execute correctly. Such code can be hard to read and someone else might not -know the precedence of operators as accurately as you do. +know the precedence of operators as accurately as you do. This includes +precedence of increment/decrement vs. field/subscript. The only exceptions are +arithmetic operators (which have had defined precedence since before computers +even existed) and boolean negation. *Bad:* ``` if (op_ret == -1 && errno != ENOENT) +++foo->bar /* incrementing foo, or incrementing foo->bar? */ +a && b || !c ``` *Good:* ``` if ((op_ret == -1) && (errno != ENOENT)) +(++foo)->bar +++(foo->bar) +(a && b) || !c +a && (b || !c) ``` Use exactly matching types -------------------------- Use a variable of the exact type declared in the manual to hold the -return value of a function. Do not use an ``equivalent'' type. +return value of a function. Do not use an 'equivalent' type. *Bad:* @@ -116,42 +275,56 @@ int len = strlen (path); size_t len = strlen (path); ``` -Never write code such as `foo->bar->baz`; check every pointer +Avoid code such as `foo->bar->baz`; check every pointer ------------------------------------------------------------- -Do not write code that blindly follows a chain of pointer -references. Any pointer in the chain may be `NULL` and thus -cause a crash. Verify that each pointer is non-null before following -it. +Do not write code that blindly follows a chain of pointer references. Any +pointer in the chain may be `NULL` and thus cause a crash. Verify that each +pointer is non-null before following it. Even if `foo->bar` has been checked +and is known safe, repeating it can make code more verbose and less clear. -Check return value of all functions and system calls +This rule includes `[]` as well as `->` because both dereference pointers. + +*Bad:* + +``` +foo->bar->field1 = value1; +xyz = foo->bar->field2 + foo->bar->field3 * foo->bar->field4; +foo->bar[5].baz +``` + +*Good:* + +``` +my_bar = foo->bar; +if (!my_bar) ... return; +my_bar->field1 = value1; +xyz = my_bar->field2 + my_bar->field3 * my_bar->field4; +``` + +Document unchecked return values ---------------------------------------------------- -The return value of all system calls and API functions must be checked -for success or failure. +In general, return values should be checked. If a function is being called +for its side effects and the return value really doesn't matter, an explicit +cast to void is required (to keep static analyzers happy) and a comment is +recommended. *Bad:* ``` close (fd); +do_important_thing (); ``` -*Good:* +*Good (or at least OK):* ``` -op_ret = close (_fd); -if (op_ret == -1) { - gf_log (this->name, GF_LOG_ERROR, - "close on file %s failed (%s)", real_path, - strerror (errno)); - op_errno = errno; - goto out; -} +(void) sleep (1); ``` - -Gracefully handle failure of malloc ------------------------------------ +Gracefully handle failure of malloc (and other allocation functions) +-------------------------------------------------------------------- GlusterFS should never crash or exit due to lack of memory. If a memory allocation fails, the call should be unwound and an error @@ -176,7 +349,7 @@ int32_t dict_get_int32 (dict_t *this, char *key); int dict_get_int32 (dict_t *this, char *key, int32_t *val); ``` -Always use the `n' versions of string functions +Always use the 'n' versions of string functions ----------------------------------------------- Unless impossible, use the length-limited versions of the string functions. @@ -193,18 +366,43 @@ strcpy (entry_path, real_path); strncpy (entry_path, real_path, entry_path_len); ``` +Do not use memset prior to sprintf/snprintf/vsnprintf etc... +------------------------------------------------------------ +snprintf(and other similar string functions) terminates the buffer with a +'\0'(null character). Hence, there is no need to do a memset before using +snprintf. (Of course you need to account one extra byte for the null character +in your allocation). + +Note: Similarly if you are doing pre-memory allocation for the buffer, use +GF_MALLOC instead of GF_CALLOC, since the later is bit costlier. + +*Bad:* + +``` +char buffer[x]; +memset (buffer, 0, x); +bytes_read = snprintf (buffer, sizeof buffer, "bad standard"); +``` + +*Good:* +``` +char buffer[x]; +bytes_read = snprintf (buffer, sizeof (buffer), "good standard"); +``` + +And it is always to good initialize the char array if the string is static. + +E.g. +``` +char buffer[] = "good standard"; +``` + No dead or commented code ------------------------- There must be no dead code (code to which control can never be passed) or commented out code in the codebase. -Only one unwind and return per function ---------------------------------------- - -There must be only one exit out of a function. `UNWIND` and return -should happen at only point in the function. - Function length or Keep functions small --------------------------------------- @@ -226,20 +424,35 @@ same_owner (posix_lock_t *l1, posix_lock_t *l2) } ``` -Defining functions as static ----------------------------- +Define functions as static +-------------------------- + +Declare functions as static unless they're exposed via a module-level API for +use from other modules. + +No nested functions +------------------- + +Nested functions have proven unreliable, e.g. as callbacks in code that uses +ucontext (green) threads, + +Use inline functions instead of macros whenever possible +-------------------------------------------------------- -Define internal functions as static only if you're -very sure that there will not be a crash(..of any kind..) emanating in -that function. If there is even a remote possibility, perhaps due to -pointer derefering, etc, declare the function as non-static. This -ensures that when a crash does happen, the function name shows up the -in the back-trace generated by libc. However, doing so has potential -for polluting the function namespace, so to avoid conflicts with other -components in other parts, ensure that the function names are -prepended with a prefix that identify the component to which it -belongs. For eg. non-static functions in io-threads translator start -with iot_. +Inline functions enforce type safety; macros do not. Use macros only for things +that explicitly need to be type-agnostic (e.g. cases where one might use +generics or templates in other languages), or that use other preprocessor +features such as `#` for stringification or `##` for token pasting. In general, +"static inline" is the preferred form. + +Avoid copypasta +--------------- + +Code that is copied and then pasted into multiple functions often creates +maintenance problems later, e.g. updating all but one instance for a subsequent +change. If you find yourself copying the same "boilerplate" many places, +consider refactoring to use helper functions (including inline) or macros, or +code generation. Ensure function calls wrap around after 80-columns -------------------------------------------------- @@ -335,13 +548,95 @@ pthread_mutex_lock (&mutex); pthread_mutex_unlock (&mutex); ``` -*A skeleton fop function:* +### Always use braces + +Even around single statements. + +*Bad:* + +``` +if (condition) action (); + +if (condition) + action (); +``` + +*Good:* + +``` +if (condition) { + action (); +} +``` + +### Avoid multi-line conditionals + +These can be hard to read and even harder to modify later. Predicate functions +and helper variables are always better for maintainability. + +*Bad:* + +``` +if ((thing1 && other_complex_condition (thing1, lots, of, args)) + || (!thing2 || even_more_complex_condition (thing2)) + || all_sorts_of_stuff_with_thing3) { + return; +} + +``` + +*Better:* + +``` +thing1_ok = predicate1 (thing1, lots, of, args +thing2_ok = predicate2 (thing2); +thing3_ok = predicate3 (thing3); + +if (!thing1_ok || !thing2_ok || !thing3_ok) { + return; +} +``` + +*Best:* + +``` +if (thing1 && other_complex_condition (thing1, lots, of, args)) { + return; +} +if (!thing2 || even_more_complex_condition (thing2)) { + /* Note potential for a different message here. */ + return; +} +if (all_sorts_of_stuff_with_thing3) { + /* And here too. */ + return; +} +``` + +### Use 'const' liberally + +If a value isn't supposed/expected to change, there's no cost to adding a +'const' keyword and it will help prevent violation of expectations. + +### Avoid global variables (including 'static' auto variables) +Almost all state in Gluster is contextual and should be contained in the +appropriate structure reflecting its scope (e.g. `call\_frame\_t`, `call\_stack\_t`, +`xlator\_t`, `glusterfs\_ctx\_t`). With dynamic loading and graph switches in play, +each global requires careful consideration of when it should be initialized or +reinitialized, when it might _accidentally_ be reinitialized, when its value +might become stale, and so on. A few global variables are needed to serve as +'anchor points' for these structures, and more exceptions to the rule might be +approved in the future, but new globals should not be added to the codebase +without explicit approval. + +## A skeleton fop function -This is the recommended template for any fop. In the beginning come -the initializations. After that, the `success' control flow should be -linear. Any error conditions should cause a `goto` to a single -point, `out`. At that point, the code should detect the error -that has occurred and do appropriate cleanup. +This is the recommended template for any fop. In the beginning come the +initializations. After that, the 'success' control flow should be linear. Any +error conditions should cause a `goto` to a label at the end. By convention +this is 'out' if there is only one such label, but a cascade of such labels is +allowable to support multi-stage cleanup. At that point, the code should detect +the error that has occurred and do appropriate cleanup. ``` int32_t diff --git a/doc/developer-guide/commit-guidelines.md b/doc/developer-guide/commit-guidelines.md new file mode 100644 index 00000000000..38bbe525cbd --- /dev/null +++ b/doc/developer-guide/commit-guidelines.md @@ -0,0 +1,136 @@ +## Git Commit Good Practice + +The following document is based on experience doing code development, bug troubleshooting and code review across a number of projects using Git. The document is mostly borrowed from [Open Stack](https://wiki.openstack.org/wiki/GitCommitMessages), but made more meaningful in the context of GlusterFS project. + +This topic can be split into two areas of concern + +* The structured set/split of the code changes +* The information provided in the commit message + +### Executive Summary +The points and examples that will be raised in this document ought to clearly demonstrate the value in splitting up changes into a sequence of individual commits, and the importance in writing good commit messages to go along with them. If these guidelines were widely applied it would result in a significant improvement in the quality of the GlusterFS Git history. Both a carrot & stick will be required to effect changes. This document intends to be the carrot by alerting people to the benefits, while anyone doing Gerrit code review can act as the stick ;-P + +In other words, when reviewing a change in Gerrit: +* Do not simply look at the correctness of the code. +* Review the commit message itself and request improvements to its content. +* Look out for commits which are mixing multiple logical changes and require the submitter to split them into separate commits. +* Ensure whitespace changes are not mixed in with functional changes. +* Ensure no-op code refactoring is done separately from functional changes. + +And so on. + +It might be mentioned that Gerrit's handling of patch series is not entirely perfect. Let that not become a valid reason to avoid creating patch series. The tools being used should be subservient to developers needs, and since they are open source they can be fixed / improved. Software source code is "read mostly, write occassionally" and thus the most important criteria is to improve the long term maintainability by the large pool of developers in the community, and not to sacrifice too much for the sake of the single author who may never touch the code again. + +And now the long detailed guidelines & examples of good & bad practice + +### Structural split of changes +The cardinal rule for creating good commits is to ensure there is only one "logical change" per commit. There are many reasons why this is an important rule: + +* The smaller the amount of code being changed, the quicker & easier it is to review & identify potential flaws. +* If a change is found to be flawed later, it may be necessary to revert the broken commit. This is much easier to do if there are not other unrelated code changes entangled with the original commit. +* When troubleshooting problems using Git's bisect capability, small well defined changes will aid in isolating exactly where the code problem was introduced. +* When browsing history using Git annotate/blame, small well defined changes also aid in isolating exactly where & why a piece of code came from. + +#### Things to avoid when creating commits +With the above points in mind, there are some commonly encountered examples of bad things to avoid + +* Mixing whitespace changes with functional code changes. + +The whitespace changes will obscure the important functional changes, making it harder for a reviewer to correctly determine whether the change is correct. Solution: Create 2 commits, one with the whitespace changes, one with the functional changes. Typically the whitespace change would be done first, but that need not be a hard rule. + +* Mixing two unrelated functional changes. + +Again the reviewer will find it harder to identify flaws if two unrelated changes are mixed together. If it becomes necessary to later revert a broken commit, the two unrelated changes will need to be untangled, with further risk of bug creation. + +* Sending large new features in a single giant commit. + +It may well be the case that the code for a new feature is only useful when all of it is present. This does not, however, imply that the entire feature should be provided in a single commit. New features often entail refactoring existing code. It is highly desirable that any refactoring is done in commits which are separate from those implementing the new feature. This helps reviewers and test suites validate that the refactoring has no unintentional functional changes. + +Even the newly written code can often be split up into multiple pieces that can be independently reviewed. For example, changes which add new internal fops or library functions, can be in self-contained commits. Again this leads to easier code review. It also allows other developers to cherry-pick small parts of the work, if the entire new feature is not immediately ready for merge. This will encourage the author & reviewers to think about the generic library functions' design, and not simply pick a design that is easier for their currently chosen internal implementation. + +The basic rule to follow is + +If a code change can be split into a sequence of patches/commits, then it should be split. Less is not more. More is more. + +##### Examples of bad practice + +TODO: Pick glusterfs specific example. + + +##### Examples of good practice + + +### Information in commit messages +As important as the content of the change, is the content of the commit message describing it. When writing a commit message there are some important things to remember + +* Do not assume the reviewer understands what the original problem was. + +When reading bug reports, after a number of back & forth comments, it is often as clear as mud, what the root cause problem is. The commit message should have a clear statement as to what the original problem is. The bug is merely interesting historical background on /how/ the problem was identified. It should be possible to review a proposed patch for correctness without needing to read the bug ticket. + +* Do not assume the reviewer has access to external web services/site. + +In 6 months time when someone is on a train/plane/coach/beach/pub troubleshooting a problem & browsing Git history, there is no guarantee they will have access to the online bug tracker, or online blueprint documents. The great step forward with distributed SCM is that you no longer need to be "online" to have access to all information about the code repository. The commit message should be totally self-contained, to maintain that benefit. + +* Do not assume the code is self-evident/self-documenting. + +What is self-evident to one person, might be clear as mud to another person. Always document what the original problem was and how it is being fixed, for any change except the most obvious typos, or whitespace only commits. + +* Describe why a change is being made. + +A common mistake is to just document how the code has been written, without describing /why/ the developer chose to do it that way. By all means describe the overall code structure, particularly for large changes, but more importantly describe the intent/motivation behind the changes. + +* Read the commit message to see if it hints at improved code structure. + +Often when describing a large commit message, it becomes obvious that a commit should have in fact been split into 2 or more parts. Don't be afraid to go back and rebase the change to split it up into separate commits. + +* Ensure sufficient information to decide whether to review. + +When Gerrit sends out email alerts for new patch submissions there is minimal information included, principally the commit message and the list of files changes. Given the high volume of patches, it is not reasonable to expect all reviewers to examine the patches in detail. The commit message must thus contain sufficient information to alert the potential reviewers to the fact that this is a patch they need to look at. + +* The first commit line is the most important. + +In Git commits the first line of the commit message has special significance. It is used as email subject line, git annotate messages, gitk viewer annotations, merge commit messages and many more places where space is at a premium. As well as summarizing the change itself, it should take care to detail what part of the code is affected. eg if it is 'afr', 'dht' or any translator. Or in some cases, it can be touching all these components, but the commit message can be 'coverity:', 'txn-framework:', 'new-fop: ', etc. + +* Describe any limitations of the current code. + +If the code being changed still has future scope for improvements, or any known limitations then mention these in the commit message. This demonstrates to the reviewer that the broader picture has been considered and what tradeoffs have been done in terms of short term goals vs. long term wishes. + +* Do not include patch set-specific comments. + +In other words, if you rebase your change please don't add "Patch set 2: rebased" to your commit message. That isn't going to be relevant once your change has merged. Please do make a note of that in Gerrit as a comment on your change, however. It helps reviewers know what changed between patch sets. This also applies to comments such as "Added unit tests", "Fixed localization problems", or any other such patch set to patch set changes that don't affect the overall intent of your commit. + +**The main rule to follow is:** + +The commit message must contain all the information required to fully understand & review the patch for correctness. Less is not more. More is more. + + +#### Including external references + +The commit message is primarily targeted towards human interpretation, but there is always some metadata provided for machine use. In the case of GlusterFS this includes at least the 'Change-id', "bug"/"feature" ID references and "Signed-off-by" tag (generated by 'git commit -s'). + +The 'Change-id' line is a unique hash describing the change, which is generated by a Git commit hook. This should not be changed when rebasing a commit following review feedback, since it is used by Gerrit, to track versions of a patch. + +The 'bug' line can reference a bug in a few ways. Gerrit creates a link to the bug when viewing the patch on review.gluster.org so that reviewers can quickly access the bug/issue on Bugzilla or Github. + +**Fixes: bz#1601166** -- use 'Fixes: bz#NNNNN' if the commit is intended to fully fix and close the bug being referenced. +**Fixes: #411** -- use 'Fixes: #NNN' if the patch fixes the github issue completely. + +**Updates: bz#1193929** -- use 'Updates: bz#NNNN' if the commit is only a partial fix and more work is needed. +**Updates: #175** -- use 'Updates: #NNNN' if the commit is only a partial fix and more work is needed for the feature completion. + +We encourage the use of `Co-Authored-By: name <name@example.com>` in commit messages to indicate people who worked on a particular patch. It's a convention for recognizing multiple authors, and our projects would encourage the stats tools to observe it when collecting statistics. + +### Summary of Git commit message structure + +* Provide a brief description of the change in the first line. +* The first line should be limited to 50 characters and should not end with a period. + +* Insert a single blank line after the first line. + +* Provide a detailed description of the change in the following lines, breaking paragraphs where needed. + +* Subsequent lines should be wrapped at 72 characters. + +Put the 'Change-id', 'Fixes bz#NNNNN' and 'Signed-off-by: <>' lines at the very end. + +TODO: Add good examples diff --git a/doc/developer-guide/datastructure-inode.md b/doc/developer-guide/datastructure-inode.md index a340ab9ca8e..45d7a941e5f 100644 --- a/doc/developer-guide/datastructure-inode.md +++ b/doc/developer-guide/datastructure-inode.md @@ -1,6 +1,6 @@ -#Inode and dentry management in GlusterFS: +# Inode and dentry management in GlusterFS: -##Background +## Background Filesystems internally refer to files and directories via inodes. Inodes are unique identifiers of the entities stored in a filesystem. Whenever an application has to operate on a file/directory (read/modify), the filesystem @@ -41,11 +41,10 @@ struct _inode_table { }; ``` -#Life-cycle +# Life-cycle ``` - inode_table_new (size_t lru_limit, xlator_t *xl) - +``` This is a function which allocates a new inode table. Usually the top xlators in the graph such as protocol/server (for bricks), fuse and nfs (for fuse and nfs mounts) and libgfapi do inode managements. Hence they are the ones which will @@ -59,11 +58,8 @@ new inode table. Thus an allocated inode table is destroyed only when the filesystem daemon is killed or unmounted. -``` - -#what it contains. -``` +# what it contains. Inode table in glusterfs mainly contains a hash table for maintaining inodes. In general a file/directory is considered to be existing if there is a corresponding inode present in the inode table. If a inode for a file/directory @@ -76,21 +72,21 @@ size of the hash table (as of now it is hard coded to 14057. The hash value of a inode is calculated using its gfid). Apart from the hash table, inode table also maintains 3 important list of inodes -1) Active list: +1. Active list: Active list contains all the active inodes (i.e inodes which are currently part of some fop). -2) Lru list: +2. Lru list: Least recently used inodes list. A limit can be set for the size of the lru list. For bricks it is 16384 and for clients it is infinity. -3) Purge list: +3. Purge list: List of all the inodes which have to be purged (i.e inodes which have to be deleted from the inode table due to unlink/rmdir/forget). And at last it also contains the mem-pool for allocating inodes, dentries so that frequent malloc/calloc and free of the data structures can be avoided. -``` -#Data structure (inode) + +# Data structure (inode) ``` struct _inode { inode_table_t *table; /* the table this inode belongs to */ @@ -108,7 +104,7 @@ struct _inode { struct _inode_ctx *_ctx; /* place holder for keeping the information about the inode by different xlators */ }; - +``` As said above, inodes are internal way of identifying the files/directories. A inode uniquely represents a file/directory. A new inode is created whenever a create/mkdir/symlink/mknod operations are performed. Apart from that a new inode @@ -128,9 +124,9 @@ inodes are those inodes whose refcount is greater than zero. Whenever some operation comes on a file/directory, and the resolver tries to find the inode for it, it increments the refcount of the inode before returning the inode. The refcount of an inode can be incremented by calling the below function - +``` inode_ref (inode_t *inode) - +``` Any xlator which wants to operate on a inode as part of some fop (or wants the inode in the callback), should hold a ref on the inode. Once the fop is completed before sending the reply of the fop to the above @@ -139,18 +135,18 @@ zero, it is removed from the active inodes list and put into LRU list maintained by the inode table. Thus in short if some fop is happening on a file/directory, the corresponding inode will be in the active list or it will be in the LRU list. -``` -#Life Cycle + +# Life Cycle A new inode is created whenever a new file/directory/symlink is created OR a successful lookup of an existing entry is done. The xlators which does inode management (as of now protocol/server, fuse, nfs, gfapi) will perform inode_link operation upon successful lookup or successful creation of a new entry. - +``` inode_link (inode_t *inode, inode_t *parent, const char *name, struct iatt *buf); - +``` inode_link actually adds the inode to the inode table (to be precise it adds the inode to the hash table maintained by the inode table. The hash value is calculated based on the gfid). Copies the gfid to the inode (the gfid is @@ -160,7 +156,7 @@ A inode is removed from the inode table and eventually destroyed when unlink or rmdir operation is performed on a file/directory, or the the lru limit of the inode table has been exceeded. -#Data structure (dentry) +# Data structure (dentry) ``` struct _dentry { @@ -170,22 +166,22 @@ struct _dentry { char *name; /* name of the directory entry */ inode_t *parent; /* directory of the entry */ }; - +``` A dentry is the presence of an entry for a file/directory within its parent directory. A dentry usually points to the inode to which it belongs to. In glusterfs a dentry contains the following fields. -1) a hook using which it can add itself to the list of +1. a hook using which it can add itself to the list of the dentries maintained by the inode to which it points to. -2) A hash table pointer. -3) Pointer to the inode to which it belongs to. -4) Name of the dentry -5) Pointer to the inode of the parent directory in which the dentry is present +2. A hash table pointer. +3. Pointer to the inode to which it belongs to. +4. Name of the dentry +5. Pointer to the inode of the parent directory in which the dentry is present A new dentry is created when a new file/directory/symlink is created or a hard link to an existing file is created. - +``` __dentry_create (inode_t *inode, inode_t *parent, const char *name); - +``` A dentry holds a refcount on the parent directory so that the parent inode is never removed from the active inode's list and put to the lru list (If the lru limit of the lru list is exceeded, there is @@ -212,15 +208,14 @@ deleted due to file removal or lru limit being exceeded the inode is retired purge list maintained by the inode table), the nlookup count is set to 0 via inode_forget api. The inode table, then prunes all the inodes from the purge list by destroying the inode contexts maintained by each xlator. - +``` unlinking of the dentry is done via inode_unlink; void inode_unlink (inode_t *inode, inode_t *parent, const char *name); - +``` If the inode has multiple hard links, then the unlink operation performed by the application results just in the removal of the dentry with the name provided by the application. For the inode to be removed, all the dentries of the inode should be unlinked. -``` diff --git a/doc/developer-guide/datastructure-iobuf.md b/doc/developer-guide/datastructure-iobuf.md index 5f521f1485f..03604e3672c 100644 --- a/doc/developer-guide/datastructure-iobuf.md +++ b/doc/developer-guide/datastructure-iobuf.md @@ -1,6 +1,6 @@ -#Iobuf-pool -##Datastructures -###iobuf +# Iobuf-pool +## Datastructures +### iobuf Short for IO Buffer. It is one allocatable unit for the consumers of the IOBUF API, each unit hosts @page_size(defined in arena structure) bytes of memory. As initial step of processing a fop, the IO buffer passed onto GlusterFS by the @@ -28,7 +28,7 @@ struct iobuf { }; ``` -###iobref +### iobref There may be need of multiple iobufs for a single fop, like in vectored read/write. Hence multiple iobufs(default 16) are encapsulated under one iobref. ``` @@ -40,7 +40,7 @@ struct iobref { int used; /* number of iobufs added to this iobref */ }; ``` -###iobuf_arenas +### iobuf_arenas One region of memory MMAPed from the operating system. Each region MMAPs @arena_size bytes of memory, and hosts @arena_size / @page_size IOBUFs. The same sized iobufs are grouped into one arena, for sanity of access. @@ -77,7 +77,7 @@ struct iobuf_arena { }; ``` -###iobuf_pool +### iobuf_pool Pool of Iobufs. As there may be many Io buffers required by the filesystem, a pool of iobufs are preallocated and kept, if these preallocated ones are exhausted only then the standard malloc/free is called, thus improving the @@ -139,8 +139,8 @@ arenas in the purge list are destroyed only if there is atleast one arena in (e.g: If there is an arena (page_size=128KB, count=32) in purge list, this arena is destroyed(munmap) only if there is an arena in 'arenas' list with page_size=128KB). -##APIs -###iobuf_get +## APIs +### iobuf_get ``` struct iobuf *iobuf_get (struct iobuf_pool *iobuf_pool); @@ -149,7 +149,7 @@ Creates a new iobuf of the default page size(128KB hard coded as of yet). Also takes a reference(increments ref count), hence no need of doing it explicitly after getting iobuf. -###iobuf_get2 +### iobuf_get2 ``` struct iobuf * iobuf_get2 (struct iobuf_pool *iobuf_pool, size_t page_size); @@ -179,7 +179,7 @@ if (requested iobuf size > Max iobuf size in the pool(1MB as of yet)) Also takes a reference(increments ref count), hence no need of doing it explicitly after getting iobuf. -###iobuf_ref +### iobuf_ref ``` struct iobuf *iobuf_ref (struct iobuf *iobuf); @@ -188,7 +188,7 @@ struct iobuf *iobuf_ref (struct iobuf *iobuf); xlator/function/, its a good practice to take a reference so that iobuf is not deleted by the allocator. -###iobuf_unref +### iobuf_unref ``` void iobuf_unref (struct iobuf *iobuf); ``` @@ -203,33 +203,33 @@ Unreference the iobuf, if the ref count is zero iobuf is considered free. Every iobuf_ref should have a corresponding iobuf_unref, and also every iobuf_get/2 should have a correspondning iobuf_unref. -###iobref_new +### iobref_new ``` struct iobref *iobref_new (); ``` Creates a new iobref structure and returns its pointer. -###iobref_ref +### iobref_ref ``` struct iobref *iobref_ref (struct iobref *iobref); ``` Take a reference on the iobref. -###iobref_unref +### iobref_unref ``` void iobref_unref (struct iobref *iobref); ``` Decrements the reference count of the iobref. If the ref count is 0, then unref all the iobufs(iobuf_unref) in the iobref, and destroy the iobref. -###iobref_add +### iobref_add ``` int iobref_add (struct iobref *iobref, struct iobuf *iobuf); ``` Adds the given iobuf into the iobref, it takes a ref on the iobuf before adding it, hence explicit iobuf_ref is not required if adding to the iobref. -###iobref_merge +### iobref_merge ``` int iobref_merge (struct iobref *to, struct iobref *from); ``` @@ -239,13 +239,13 @@ on all the iobufs added to the 'to' iobref. Hence iobref_unref should be performed both on 'from' and 'to' iobrefs (performing iobref_unref only on 'to' will not free the iobufs and may result in leak). -###iobref_clear +### iobref_clear ``` void iobref_clear (struct iobref *iobref); ``` Unreference all the iobufs in the iobref, and also unref the iobref. -##Iobuf Leaks +## Iobuf Leaks If all iobuf_refs/iobuf_new do not have correspondning iobuf_unref, then the iobufs are not freed and recurring execution of such code path may lead to huge memory leaks. The easiest way to identify if a memory leak is caused by iobufs diff --git a/doc/developer-guide/datastructure-mem-pool.md b/doc/developer-guide/datastructure-mem-pool.md index c71aa2a8ddd..225567cbf9f 100644 --- a/doc/developer-guide/datastructure-mem-pool.md +++ b/doc/developer-guide/datastructure-mem-pool.md @@ -1,5 +1,5 @@ -#Mem-pool -##Background +# Mem-pool +## Background There was a time when every fop in glusterfs used to incur cost of allocations/de-allocations for every stack wind/unwind between xlators because stack/frame/*_localt_t in every wind/unwind was allocated and de-allocated. Because of all these system calls in the fop path there was lot of latency and the worst part is that most of the times the number of frames/stacks active at any time wouldn't cross a threshold. So it was decided that this threshold number of frames/stacks would be allocated in the beginning of the process only once. Get one of them from the pool of stacks/frames whenever `STACK_WIND` is performed and put it back into the pool in `STACK_UNWIND`/`STACK_DESTROY` without incurring any extra system calls. The data structures are allocated only when threshold number of such items are in active use i.e. pool is in complete use.% increase in the performance once this was added to all the common data structures (inode/fd/dict etc) in xlators throughout the stack was tremendous. ## Data structure @@ -27,7 +27,7 @@ will be served from here until all the elements in the pool are in use i.e. cold }; ``` -##Life-cycle +## Life-cycle ``` mem_pool_new (data_type, unsigned long count) @@ -120,5 +120,5 @@ mem_pool_destroy (struct mem_pool *pool) Deletes this pool from the `global_list` maintained by `glusterfs-ctx` and frees all the memory allocated in `mem_pool_new`. -###How to pick pool-size +### How to pick pool-size This varies from work-load to work-load. Create the mem-pool with some random size and run the work-load. Take the statedump after the work-load is complete. In the statedump if `max_alloc` is always less than `cold_count` may be reduce the size of the pool closer to `max_alloc`. On the otherhand if there are lots of `pool-misses` then increase the `pool_size` by `max_stdalloc` to achieve better 'hit-rate' of the pool. diff --git a/doc/developer-guide/dirops-transactions-in-dht.md b/doc/developer-guide/dirops-transactions-in-dht.md new file mode 100644 index 00000000000..909a97001aa --- /dev/null +++ b/doc/developer-guide/dirops-transactions-in-dht.md @@ -0,0 +1,273 @@ +# dirops transactions in dht +Need for transactions during operations on directories arise from two +basic design elements of DHT: + + 1. A directory is created on all subvolumes of dht. Since glusterfs + associates each file-system object with an unique gfid, every + subvolume should have the same unique mapping of (path of directory, + gfid). To elaborate, + * Each subvolume should've same gfid associated with a path to + directory. + * A gfid should not be associated with more than one path in any + subvolume. + + So, entire operations like mkdir, renamedir, rmdir and creation of + directories during self-heal need to be atomic in dht. In other words, + any of these operations shouldn't begin on an inode if one of them is + already in progress on the same inode, till it completes on all + subvolumes of dht. If not, more than one of these operations + happening in parallel can break any or all of the two requirements + listed above. This is referred in the rest of the document by the + name _Atomicity during namespace operations_. + + 2. Each directory has an independent layout persisted on + subvolumes. Each subvolume contains only part of the layout relevant + to it. For performance reasons _and_ since _only_ dht has aggregated + view, this layout is cached in memory of client. To make sure dht + reads or modifies a single complete layout while parallel modifications of the layout are in progress, we need atomicity during layout modification and reading. This is referred in the rest of the document as _Atomicity during layout modification and reading_. + +Rest of the document explains how atomicity is achieved for each of +the case above. + +**Atomicity during layout modification and reading** +File operations a.k.a fops can be classified into two categories based on how they consume layout. + + - Layout writer. Setting of layout during selfheal of a directory is + layout writer of _that_ directory. + - Layout reader. + * Any entry fop like create, unlink, rename, link, symlink, + unlink, mknod, rename, mkdir, rmdir, renamedir which needs layout of the parent directory. Each of these fops are readers of layout on parent directory. + * setting of layout during mkdir of a directory is considered as + a reader of the same directory's layout. The reason for this is that + only a parallel lookup on that directory can be a competing fop that modifies the layout (Other fops need gfid of the directory which can be got only after either lookup or mkdir completes). However, healing of layout is considered as a writer and a single writer blocks all readers. + +*Algorithm* +Atomicity is achieved by locking on the inode of directory whose +layout is being modified or read. The fop used is inodelk. + - Writer acquires blocking inodelk (directory-inode, write-lock) on + all subvolumes serially. The order of subvols in which they are + locked by different clients remains constant for a directory. If locking fails on any subvolume, layout modification is abandoned. + - Reader acquires an inodelk (directory-inode, read-lock) on _any_ + one subvolume. If locking fails on a subvolume (say with + ESTALE/ENOTCONN error), locking can be tried on other subvolumes till + we get one lock. If we cannot get lock on at least one subvolume, + consistency of layout is not guaranteed. Based on the consistency + requirements of fops, they can be failed or continued. + +Reasons why writer has to lock on _all_ subvols: + + - DHT don't have a metadata server and locking is implemented by brick. So, there is no well-defined subvol/brick that can be used as an arbitrator by different clients while acquiring locks. + - readers should acquire as minimum number of locks as possible. In + other words, the algorithm aims to have less synchronization cost to + readers. + - The subvolume to which a directory hashes could be used as a + lock server. However, in the case of an entry fop like create + (/a/b/c) where we want to block modification of layout of b for the + duration of create, we would be required to acquire lock on the + subvol to which /a/b hashes. To find out the hashed-subvol of + /a/b, we would need layout of /a. Note that how there is a dependency + of locking the layouts of ancestors all the way to root. So this + locking is not preferred. Also, note that only the immediate parent + inode is available in arguments of a fop like create. + +**Atomicity during namespace operations** + + - We use locks on inode of parent directory in the namespace of + _"basename"_ during mkdir, rmdir, renamedir and directory + creation phase of self-heal. The exact fop we use is _entrylk + (parent-inode, "basename")_. + - refresh in-memory layout of parent-inode from values stored on backend + - _entrylk (parent-inode, "basename")_ is done on subvolume to which + _"basename" hashes_. So, this operation is a _reader_ of the + layout on _parent-inode_. Which means an _inodelk (parent-inode, + read-lock)_ has to be completed before _entrylk (parent-inode, + "basename")_ is issued. Both the locks have to be held till the + operation is tried on all subvolumes. If acquiring of any/all of + these locks fail, the operation should be failed. + +With the above details, algorithms for mkdir, rmdir, renamedir, +self-heal of directory are explicitly detailed below. + +**Self-heal of a directory** + + - creation of directories on subvolumes is done only during + _named-lookup_ of a directory as we need < parent-inode, + "basename" >. + - If a directory is missing on one or more subvolumes, + * acquire _inodelk (parent-inode, read-lock)_ on _any one_ of the + subvolumes. + * refresh the in-memory layout of parent-inode from values stored on backend + * acquire _entrylk (parent-inode, "basename")_ on the subvolume + to which _"basename"_ hashes. + * If any/all of the locks fail, self-heal is aborted. + * create directories on missing subvolumes. + * release _entrylk (parent-inode, "basename")_. + * release _inodelk (parent-inode, read-lock)_. + + - If layout of a directory needs healing + * acquire _inodelk (directory-inode, write-lock)_ on _all_ the + subvolumes. If locking fails on any of the subvolumes, + self-heal is aborted. Blocking Locks are acquired serially across subvolumes in a _well-defined_ order which is _constant_ across all the healers of a directory. One order could be the order in which subvolumes are stored in the array _children_ of dht xlator. + * heal the layout. + * release _inodelk (directory-inode, write-lock)_ on _all_ the + subvolumes in parallel. + * Note that healing of layout can be done in both _named_ and + _nameless_ lookups of a directory as _only directory-inode_ is needed + for healing and it is available during both. + +**mkdir (parent-inode, "basename")** + +* while creating directory across subvolumes, + + - acquire _inodelk (parent-inode, read-lock)_ on _any one_ of the + subvolumes. + - refresh in-memory layout of parent-inode from values stored on backend + - acquire _entrylk (parent-inode, "basename")_ on the subvolume to + which _"basename"_ hashes. + - If any/all of the above two locks fail, release the locks that + were acquired successfully and mkdir should be failed (as perceived by application). + - do _mkdir (parent-inode, "basename")_ on the subvolume to which + _"basename"_ hashes. If this mkdir fails, mkdir is failed. + - do _mkdir (parent-inode, "basename")_ on the remaining subvolumes. + - release _entrylk (parent-inode, "basename")_. + - release _inodelk (parent-inode, "read-lock")_. +* while setting the layout of a directory, + - acquire _inodelk (directory-inode, read-lock)_ on _any one_ of the + subvolumes. + - If locking fails, cleanup the locks that were acquired + successfully and abort layout setting. Note that we'll have a + directory without a layout till a lookup happens on the + directory. This means entry operations within this directory fail + in this time window. We can also consider failing mkdir. The + problem of dealing with a directory without layout is out of the + scope of this document. + - set the layout on _directory-inode_. + - release _inodelk (directory-inode, read-lock)_. +* Note that during layout setting we consider mkdir as a _reader_ not + _writer_, though it is setting the layout. Reasons are: + - Before any of other readers like create, link etc that operate on + this directory to happen, _gfid_ of this directory has to be + resolved. But _gfid_ is only available only if either of following + conditions are true: + * after mkdir is complete. + * a lookup on the same path happens parallel to in-progress + mkdir. + + But, on completion of any of the above two operations, layout + will be healed. So, none of the _readers_ will happen on a + directory with partial layout. + +* Note that since we've an _entrylk (parent-inode, "basename")_ for + the entire duration of (attempting) creating directories, parallel + mkdirs will no longer contend on _mkdir_ on subvolume _to which "basename" hashes_. But instead, contend on _entrylk (parent-inode, "basename")_ on the subvolume _to which "basename" hashes_. So, we can attempt the _mkdir_ in _parallel_ on all subvolumes instead of two stage mkdir on hashed first and the rest of them in parallel later. However, we need to make sure that mkdir is successful on the subvolume _to which "basename" hashes_ for mkdir to be successful (as perceived by application). In the case of failed mkdir (as perceived by application), a cleanup should be performed on all the subvolumes before _entrylk (parent-inode, "basename")_ is released. + +**rmdir (parent-inode, "basename", directory-inode)** + + - acquire _inodelk (parent-inode, read-lock)_ on _any one_ + subvolume. + - refresh in-memory layout of parent-inode from values stored on backend + - acquire _entrylk (parent-inode, "basename")_ on the subvolume to + which _"basename" hashes_. + - If any/all of the above locks fail, rmdir is failed after cleanup + of the locks that were acquired successfully. + - do _rmdir (parent-inode, "basename")_ on the subvolumes to which + _"basename" doesn't hash to_. + * If successful, continue. + * Else, + * recreate directories on those subvolumes where rmdir + succeeded. + * heal the layout of _directory-inode_. Note that this will have + same synchronization requirements as discussed during layout + healing part of the section "Directoy self-heal" above. + * release _entrylk (parent-inode, "basename")_. + * release _inodelk (parent-inode, read-lock)_. + * fail _rmdir (parent-inode, "basename")_ to application. + - do _rmdir (parent-inode, "basename")_ on the subvolume to which + _"basename" hashes_. + - If successful, continue. + - Else, Go to the failure part of _rmdir (parent-inode, "basename")_ + on subvolumes to which "basename" _doesn't hash to_. + - release _entrylk (parent-inode, "basename")_. + - release _inodelk (parent-inode, read-lock)_. + - return success to application. + +**renamedir (src-parent-inode, "src-basename", src-directory-inode, dst-parent-inode, "dst-basename", dst-directory-inode)** + + - requirement is to prevent any operation in both _src-namespace_ + and _dst-namespace_. So, we need to acquire locks on both + namespaces.We also need to have constant ordering while acquiring + locks during parallel renames of the form _rename (src, dst)_ and + _rename (dst, src)_ to prevent deadlocks. We can sort gfids of + _src-parent-inode_ and _dst-parent-inode_ and use that order to + acquire locks. For the sake of explanation lets say we ended up + with order of _src_ followed by _dst_. + - acquire _inodelk (src-parent-inode, read-lock)_. + - refresh in-memory layout of src-parent-inode from values stored on backend + - acquire _entrylk (src-parent-inode, "src-basename")_. + - acquire _inodelk (dst-parent-inode, read-lock)_. + - refresh in-memory layout of dst-parent-inode from values stored on backend + - acquire _entrylk (dst-parent-inode, "dst-basename")_. + - If acquiring any/all of the locks above fail, + * release the locks that were successfully acquired. + * fail the renamedir operation to application + * done + - do _renamedir ("src", "dst")_ on the subvolume _to which "dst-basename" hashes_. + * If failure, Goto point _If acquiring any/all of the locks above fail_. + * else, continue. + - do _renamedir ("src", "dst")_ on rest of the subvolumes. + * If there is any failure, + * revert the successful renames. + * Goto to point _If acquiring any/all of the locks above fail_. + * else, + - release all the locks acquired. + - return renamedir as success to application. + +**Some examples of races** +This section gives concrete examples of races that can result in inconsistencies explained in the beginning of the document. + +Some assumptions are: + +* We consider an example distribute of three subvols s1, s2 and s3. +* For examples of renamedir ("src", "dst"), _src_ hashes to s1 and _dst_ hashes to s2. _src_ and _dst_ are associated with _gfid-src_ and _gfid-dst_ respectively +* For non renamedir examples, _dir_ is the name of directory and it hashes to s1. + +And the examples are: + + - mkdir vs rmdir - inconsistency in namespace. + * mkdir ("dir", gfid1) is complete on s1 + * rmdir is issued on same directory. Note that, since rmdir needs a gfid, a lookup should be complete before rmdir. lookup creates the directory on rest of the subvols as part of self-heal. + * rmdir (gfid1) deletes directory from all subvols. + * A new mkdir ("dir", gfid2) is issued. It is successful on s1 associating "dir" with gfid2. + * mkdir ("dir", gfid1) resumes and creates directory on s2 and s3 associating "dir" with gfid1. + * mkdir ("dir", gfid2) fails with EEXIST on s2 and s3. Since, EEXIST errors are ignored, mkdir is considered successful to application. + * In this example we have multiple inconsitencies + * "dir" is associated with gfid1 on s2, s3 and with gfid2 on s1 + * Even if mkdir ("dir", gfid2) was not issued, we would've a case of a directory magically reappearing after a successful rmdir. + - lookup heal vs rmdir + * rmdir ("dir", gfid1) is issued. It is successful on s2 and s3 (non-hashed subvols for name "dir") + * lookup ("dir") is issued. Since directory is present on s1 yet, it is created on s2 and s3 associating with gfid1 as part of self-heal + * rmdir ("dir", gfid1) is complete on s1 and it is successful + * Another lookup ("dir") creates the directory on s1 too + * "dir" magically reappears after a successful rmdir + - lookup heal (src) vs renamedir ("src", "dst") + * renamedir ("src", "dst") complete on s2 + * lookup ("src") recreates _src_ with _gfid-src_ on s2 + * renamedir ("src", "dst") completes on s1, s3. After rename is complete path _dst_ will be associated with gfid _gfid-src_ + * Another lookup ("src") recreates _src_ on subvols s1 and s3, associating it with gfid _gfid-src_ + * Inconsistencies are + * after a successful renamedir ("src", "dst"), both src and dst exist + * Two directories - src and dst - are associated with same gfid. One common symptom is that some entries (of the earlier _src_ and current _dst_ directory) being missed out in readdir listing as the gfid handle might be pointing to the empty healed directory than the actual directory containing entries + - lookup heal (dst) vs renamedir ("src", "dst") + * dst exists and empty when renamdir started + * dst doesn't exist when renamedir started + - renamedir ("src", "dst") complete on s2 and s3 + - lookup ("dst") creates _dst_ associating it with _gfid-src_ on s1 + - An entry is created in _dst_ on either s1 + - renamedir ("src", "dst") on s1 will result in a directory _dst/dst_ as _dst_ is no longer empty and _man 2 rename_ states that if _dst_ is not empty, _src_ is renamed _as a subdirectory of dst_ + - A lookup ( _dst/dst_) creates _dst/dst_ on s2 and s3 associating with _gfid-src_ as part of self-heal + - Inconsistencies are: + * Two directories - _dst_ and _dst/dst_ - exist even though both of them didn't exist at the beginning of renamedir + * Both _dst_ and _dst/dst_ have same gfid - _gfid-src_. As observed earlier, symptom might be directory listing being incomplete + - mkdir (dst) vs renamedir ("src", "dst") + - rmdir (src) vs renamedir ("src", "dst") + - rmdir (dst) vs renamedir ("src", "dst") diff --git a/doc/developer-guide/ec-implementation.md b/doc/developer-guide/ec-implementation.md new file mode 100644 index 00000000000..77e62583caa --- /dev/null +++ b/doc/developer-guide/ec-implementation.md @@ -0,0 +1,588 @@ +Erasure coding implementation +============================= + +This document provides information about how [erasure code][1] has +been implemented into ec translator. It describes the algorithm used +and the optimizations made, but it doesn't contain a full description +of the mathematical background needed to understand erasure coding in +general. It either describes the other parts of ec not directly +related to the encoding/decoding procedure, like synchronization or +fop management. + + +Introduction +------------ + +EC is based on [Reed-Solomon][2] erasure code. It's a very old code. +It's not considered the best one nowadays, but is good enough and it's +one of the few codes that is not covered by any patent and can be +freely used. + +To define the Reed-Solomon code we use 3 parameters: + + * __Key fragments (K)__ + It represents the minimum number of healthy fragments that will be + needed to be able to recover the original data. Any subset of K + out of the total number of fragments will serve. + + * __Redundancy fragments (R)__ + It represents the number of extra fragments to compute for each + original data block. This value determines how many fragments can + be lost before being unable to recover the original data. + + * __Fragment size (S)__ + This determines the size of each fragment. The original data + block size is computed as S * K. Currently this values is fixed + to 512 bytes. + + * __Total number of fragments (N = K + R)__ + This isn't a real parameter but it will be useful to simplify + the following descriptions. + +From the point of view of the implementation, it only consists on +matrix multiplications. There are two kinds of matrices to use for +Reed-Solomon: + + * __[Systematic][3]__ + This kind of matrix has the particularity that K of the encoded + fragments are simply a copy of the original data, divided into K + pieces. Thus no real encoding needs to be done for them and only + the R redundancy fragments need to be computed. + + This kind of matrices contain one KxK submatrix that is the + [identity matrix][4]. + + * __Non-systematic__ + This kind of matrix doesn't contain an identity submatrix. This + means that all of the N fragments need to be encoded, requiring + more computation. On the other hand, these matrices have some nice + properties that allow faster implementations of some algorithms, + like the matrix inversion used to decode the data. + + Another advantage of non-systematic matrices is that the decoding + time is constant, independently of how many fragments are lost, + while systematic approach can suffer from performance degradation + when one fragment is lost. + +All non-systematic matrices can be converted to systematic ones, but +then we lose the good properties of the non-systematic. We have to +choose betwee best peek performance (systematic) and performance +stability (non-systematic). + + +Encoding procedure +------------------ + +To encode a block of data we need a KxN matrix where each subset of K +rows is [linearly independent][5]. In other words, the determinant of +each KxK submatrix is not 0. + +There are some known ways to obtain this kind of matrices. EC uses a +small variation of a matrix known as [Vandermonde Matrix][6] where +each element of the matrix is defined as: + + a(i, j) = i ^ (K - j) + + where i is the row from 1 to N, and j is the column from 1 to K. + +This is exactly the Vandermonde Matrix but with the elements of each +row in reverse order. This change is made to be able to implement a +small optimization in the matrix multiplication. + +Once we have the matrix, we only need to compute the multiplication +of this matrix by a vector composed of K elements of data coming from +the original data block. + + / \ / \ + | 1 1 1 1 1 | / \ | a + b + c + d + e = t | + | 16 8 4 2 1 | | a | | 16a + 8b + 4c + 2d + e = u | + | 81 27 9 3 1 | | b | = | 81a + 27b + 9c + 3d + e = v | + | 256 64 16 4 1 | * | c | | 256a + 64b + 16c + 4d + e = w | + | 625 125 25 5 1 | | d | | 625a + 125b + 25c + 5d + e = x | + | 1296 216 36 6 1 | | e | | 1296a + 216b + 36c + 6d + e = y | + | 2401 343 49 7 1 | \ / | 2401a + 343b + 49c + 7d + e = z | + \ / \ / + +The optimization that can be done here is this: + + 16a + 8b + 4c + 2d + e = 2(2(2(2a + b) + c) + d) + e + +So all the multiplications are always by the number of the row (2 in +this case) and we don't need temporal storage for intermediate +results: + + a *= 2 + a += b + a *= 2 + a += c + a *= 2 + a += d + a *= 2 + a += e + +Once we have the result vector, each element is a fragment that needs +to be stored in a separate place. + + +Decoding procedure +------------------ + +To recover the data we need exactly K of the fragments. We need to +know which K fragments we have (i.e. the original row number from +which each fragment was calculated). Once we have this data we build +a square KxK matrix composed by the rows corresponding to the given +fragments and invert it. + +With the inverted matrix, we can recover the original data by +multiplying it with the vector composed by the K fragments. + +In our previous example, if we consider that we have recovered +fragments t, u, v, x and z, corresponding to rows 1, 2, 3, 5 and 7, +we can build the following matrix: + + / \ + | 1 1 1 1 1 | + | 16 8 4 2 1 | + | 81 27 9 3 1 | + | 625 125 25 5 1 | + | 2401 343 49 7 1 | + \ / + +And invert it: + + / \ + | 1/48 -1/15 1/16 -1/48 1/240 | + | -17/48 16/15 -15/16 13/48 -11/240 | + | 101/48 -86/15 73/16 -53/48 41/240 | + | -247/48 176/15 -129/16 83/48 -61/240 | + | 35/8 -7 35/8 -7/8 1/8 | + \ / + +Multiplying it by the vector (t, u, v, x, z) we recover the original +data (a, b, c, d, e): + + / \ / \ / \ + | 1/48 -1/15 1/16 -1/48 1/240 | | t | | a | + | -17/48 16/15 -15/16 13/48 -11/240 | | u | | b | + | 101/48 -86/15 73/16 -53/48 41/240 | * | v | = | c | + | -247/48 176/15 -129/16 83/48 -61/240 | | x | | d | + | 35/8 -7 35/8 -7/8 1/8 | | z | | e | + \ / \ / \ / + + +Galois Field +------------ + +This encoding/decoding procedure is quite complex to compute using +regular mathematical operations and it's not well suited for what +we want to do (note that matrix elements can grow unboundly). + +To solve this problem, exactly the same procedure is done inside a +[Galois Field][7] of characteristic 2, which is a finite field with +some interesting properties that make it specially useful for fast +operations using computers. + +There are two main differences when we use this specific Galois Field: + + * __All regular additions are replaced by bitwise xor's__ + For todays computers it's not really faster to execute an xor + compared to an addition, however replacing additions by xor's + inside a multiplication has many advantages (we will make use of + this to optimize the multiplication). + + Another consequence of this change is that additions and + substractions are really the same xor operation. + + * __The elements of the matrix are bounded__ + The field uses a modulus that keep all possible elements inside + a delimited region, avoiding really big numbers and fixing the + number of bits needed to represent each value. + + In the current implementation EC uses 8 bits per field element. + +It's very important to understand how multiplications are computed +inside a Galois Field to be able to understand how has it been +optimized. + +We'll start with a simple 'old school' multiplication but in base 2. +For example, if we want to multiply 7 * 5 (111b * 101b in binary), we +do the following: + + 1 1 1 (= 7) + * 1 0 1 (= 5) + ----------- + 1 1 1 (= 7) + + 0 0 0 (= 0) + + 1 1 1 (= 7) + ----------- + 1 0 0 0 1 1 (= 35) + +This is quite simple. Note that the addition of the third column +generates a carry that is propagated to all the other left columns. + +The next step is to define the modulus of the field. Suppose we use +11 as the modulus. Then we convert the result into an element of the +field by dividing by the modulus and taking the residue. We also use +the 'old school' method in binary: + + + 1 0 0 0 1 1 (= 35) | 1 0 1 1 (= 11) + - 0 0 0 0 ---------------- + --------- 0 1 1 (= 3) + 1 0 0 0 1 + - 1 0 1 1 + ----------- + 0 0 1 1 0 1 + - 1 0 1 1 + ------------- + 0 0 1 0 (= 2) + +So, 7 * 5 in a field with modulus 11 is 2. Note that the main +objective in each iteration of the division is to make higher bits +equal to 0 when possible (if it's not possible in one iteration, it +will be zeroed on the next). + +If we do the same but changing additions with xors we get this: + + 1 1 1 (= 7) + * 1 0 1 (= 5) + ----------- + 1 1 1 (= 7) + x 0 0 0 (= 0) + x 1 1 1 (= 7) + ----------- + 1 1 0 1 1 (= 27) + +In this case, the xor of the third column doesn't generate any carry. + +Now we need to divide by the modulus. We can also use 11 as the +modulus since it still satisfies the needed conditions to work on a +Galois Field of characteristic 2 with 3 bits: + + 1 1 0 1 1 (= 27) | 1 0 1 1 (= 11) + x 1 0 1 1 ---------------- + --------- 1 1 1 (= 7) + 0 1 1 0 1 + x 1 0 1 1 + ----------- + 0 1 1 0 1 + x 1 0 1 1 + ------------- + 0 1 1 0 (= 6) + +Note that, in this case, to make zero the higher bit we need to +consider the result of the xor operation, not the addition operation. + +So, 7 * 5 in a Galois Field of 3 bits with modulus 11 is 6. + + +Optimization +------------ + +To compute all these operations in a fast way some methods have been +traditionally used. Maybe the most common is the [lookup table][8]. + +The problem with this method is that it requires 3 lookups for each +byte multiplication, greatly amplifying the needed memory bandwidth +and making it difficult to take advantage of any SIMD support on the +processor. + +What EC does to improve the performance is based on the following +property (using the 3 bits Galois Field of the last example): + + A * B mod N = (A * b{2} * 4 mod N) x + (A * b{1} * 2 mod N) x + (A * b{0} mod N) + +This is basically a rewrite of the steps made in the previous example +to multiply two numbers but moving the modulus calculation inside each +intermediate result. What we can see here is that each term of the +xor can be zeroed if the corresponding bit of B is 0, so we can ignore +that factor. If the bit is 1, we need to compute A multiplied by a +power of two and take the residue of the division by the modulus. We +can precompute these values: + + A0 = A (we don't need to compute the modulus here) + A1 = A0 * 2 mod N + A2 = A1 * 2 mod N + +Having these values we only need to add those corresponding to bits +set to 1 in B. Using our previous example: + + A = 1 1 1 (= 7) + B = 1 0 1 (= 5) + + A0 = 1 1 1 (= 7) + A1 = 1 1 1 * 1 0 mod 1 0 1 1 = 1 0 1 (= 5) + A2 = 1 0 1 * 1 0 mod 1 0 1 1 = 0 0 1 (= 1) + + Since only bits 0 and 2 are 1 in B, we add A0 and A2: + + A0 + A2 = 1 1 1 x 0 0 1 = 1 1 0 (= 6) + +If we carefully look at what we are doing when computing each Ax, we +see that we do two basic things: + + - Shift the original value one bit to the left + - If the highest bit is 1, xor with the modulus + +Let's write this in a detailed way (representing each bit): + + Original value: a{2} a{1} a{0} + Shift 1 bit: a{2} a{1} a{0} 0 + + If a{2} is 0 we already have the result: + a{1} a{0} 0 + + If a{2} is 1 we need to xor with the modulus: + 1 a{1} a{0} 0 x 1 0 1 1 = a{1} (a{0} x 1) 1 + +An important thing to see here is that if a{2} is 0, we can get the +same result by xoring with all 0 instead of the modulus. For this +reason we can rewrite the modulus as this: + + Modulus: a{2} 0 a{2} a{2} + +This means that the modulus will be 0 0 0 0 is a{2} is 0, so the value +won't change, and it will be 1 0 1 1 if a{2} is 1, giving the correct +result. So, the computation is simply: + + Original value: a{2} a{1} a{0} + Shift 1 bit: a{2} a{1} a{0} 0 + Apply modulus: a{1} (a{0} x a{2}) a{2} + +We can compute all Ax using this method. We'll get this: + + A0 = a{2} a{1} a{0} + A1 = a{1} (a{0} x a{2}) a{2} + A2 = (a{0} x a{2}) (a{1} x a{2}) a{1} + +Once we have all terms, we xor the ones corresponding to the bits set +to 1 in B. In out example this will be A0 and A2: + + Result: (a{2} x a{0} x a{2}) (a{1} x a{1} x a{2}) (a{0} x a{1}) + +We can easily see that we can remove some redundant factors: + + Result: a{0} a{2} (a{0} x a{1}) + +This way we have come up with a simply set of equations to compute the +multiplication of any number by 5. If A is 1 1 1 (= 7), the result +must be 1 1 0 (= 6) using the equations, as we expected. If we try +another numbe for A, like 0 1 0 (= 2), the result must be 0 0 1 (= 1). + +This seems a really fast way to compute the multiplication without +using any table lookup. The problem is that this is only valid for +B = 5. For other values of B another set of equations will be found. +To solve this problem we can pregenerate the equations for all +possible values of B. Since the Galois Field we use is small, this is +feasible. + +One thing to be aware of is that, in general, two equations for +different bits of the same B can share common subexpressions. This +gives space for further optimizations to reduce the total number of +xors used in the final equations for a given B. However this is not +easy to find, since finding the smallest number of xors that give the +correct result is an NP-Problem. For EC an exhaustive search has been +made to find the best combinations for each possible value. + + +Implementation +-------------- + +All this seems great from the hardware point of view, but implementing +this using normal processor instructions is not so easy because we +would need a lot of shifts, ands, xors and ors to move the bits of +each number to the correct position to compute the equation and then +another shift to put each bit back to its final place. + +For example, to implement the functions to multiply by 5, we would +need something like this: + + Bit 2: T2 = (A & 1) << 2 + Bit 1: T1 = (A & 4) >> 1 + Bit 0: T0 = ((A >> 1) x A) & 1 + Result: T2 + T1 + T0 + +This doesn't look good. So here we make a change in the way we get +and process the data: instead of reading full numbers into variables +and operate with them afterwards, we use a single independent variable +for each bit of the number. + +Assume that we can read and write independent bits from memory (later +we'll see how to solve this problem when this is not possible). In +this case, the code would look something like this: + + Bit 2: T2 = Mem[2] + Bit 1: T1 = Mem[1] + Bit 0: T0 = Mem[0] + Computation: T1 ^= T0 + Store result: Mem[2] = T0 + Mem[1] = T2 + Mem[0] = T1 + +Note that in this case we handle the final reordering of bits simply +by storing the right variable to the right place, without any shifts, +ands nor ors. In fact we only have memory loads, memory stores and +xors. Note also that we can do all the computations directly using the +variables themselves, without additional storage. This true for most +of the values, but in some cases an additional one or two temporal +variables will be needed to store intermediate results. + +The drawback of this approach is that additions, that are simply a +xor of two numbers will need as many xors as bits are in each number. + + +SIMD optimization +----------------- + +So we have a good way to compute the multiplications, but even using +this we'll need several operations for each byte of the original data. +We can improve this by doing multiple multiplications using the same +set of instructions. + +With the approach taken in the implementation section, we can see that +in fact it's really easy to add SIMD support to this method. We only +need to store in each variable one bit from multiple numbers. For +example, when we load T2 from memory, instead of reading the bit 2 of +the first number, we can read the bit 2 of the first, second, third, +fourth, ... numbers. The same can be done when loading T1 and T0. + +Obviously this needs to have a special encoding of the numbers into +memory to be able to do that in a single operation, but since we can +choose whatever encoding we want for EC, we have chosen to have +exactly that. We interpret the original data as a stream of bits, and +we split it into subsequences of length L, each containing one bit of +a number. Every S subsequences form a set of numbers of S bits that +are encoded and decoded as a single group. This repeats for any +remaining data. + +For example, in a simple case with L = 8 and S = 3, the original data +would contain something like this (interpreted as a sequence of bits, +offsets are also bit-based): + + Offset 0: a{0} b{0} c{0} d{0} e{0} f{0} g{0} h{0} + Offset 8: a{1} b{1} c{1} d{1} e{1} f{1} g{1} h{1} + Offset 16: a{2} b{2} c{2} d{2} e{2} f{2} g{2} h{2} + Offset 24: i{0} j{0} k{0} l{0} m{0} n{0} o{0} p{0} + Offset 32: i{1} j{1} k{1} l{1} m{1} n{1} o{1} p{1} + Offset 40: i{2} j{2} k{2} l{2} m{2} n{2} o{2} p{2} + +Note: If the input file is not a multiple of S * L, 0-padding is done. + +Here we have 16 numbers encoded, from A to P. This way we can easily +see that reading the first byte of the file will read all bits 0 of +number A, B, C, D, E, F, G and H. The same happens with bits 1 and 2 +when we read the second and third bytes respectively. Using this +encoding and the implementation described above, we can see that the +same set of instructions will be computing the multiplication of 8 +numbers at the same time. + +This can be further improved if we use L = 64 with 64 bits variables +on 64-bits processor. It's even faster if we use L = 128 using SSE +registers or L = 256 using AVX registers on Intel processors. + +Currently EC uses L = 512 and S = 8. This means that numbers are +packed in blocks of 512 bytes and gives space for even bigger +processor registers up to 512 bits. + + +Conclusions +----------- + +This method requires a single variable/processor register for each +bit. This can be challenging if we want to avoid additional memory +accesses, even if we use modern processors that have many registers. +However, the implementation we chose for the Vandermonde Matrix +doesn't require temporary storage, so we don't need a full set of 8 +new registers (one for each bit) to store partial computations. +Additionally, the computation of the multiplications requires, at +most, 2 extra registers, but this is afordable. + +Xors are a really fast operation in modern processors. Intel CPU's +can dispatch up to 3 xors per CPU cycle if there are no dependencies +with ongoing previous instructions. Worst case is 1 xor per cycle. So, +in some configurations, this method could be very near to the memory +speed. + +Another interesting thing of this method is that all data it needs to +operate is packed in small sequential blocks of memory, meaning that +it can take advantage of the faster internal CPU caches. + + +Results +------- + +For the particular case of 8 bits, EC can compute each multiplication +using 12.8 xors on average (without counting 0 and 1 that do not +require any xor). Some numbers require less, like 2 that only requires +3 xors. + +Having all this, we can check some numbers to see the performance of +this method. + +Maybe the most interesting thing is the average number of xors needed +to encode a single byte of data. To compute this we'll need to define +some variables: + + * K: Number of data fragments + * R: Number of redundancy fragments + * N: K + R + * B: Number of bits per number + * A: Average number of xors per number + * Z: Bits per CPU register (can be up to 256 for AVX registers) + * X: Average number of xors per CPU cycle + * L: Average cycles per load + * S: Average cycles per store + * G: Core speed in Hz + +_Total number of bytes processed for a single matrix multiplication_: + + * __Read__: K * B * Z / 8 + * __Written__: N * B * Z / 8 + +_Total number of memory accesses_: + + * __Loads__: K * B * N + * __Stores__: B * N + +> We need to read the same K * B * Z bits, in registers of Z bits, N +> times, one for each row of the matrix. However the last N - 1 reads +> could be made from the internal CPU caches if conditions are good. + +_Total number of operations_: + + * __Additions__: (K - 1) * N + * __Multiplications__: K * N + +__Total number of xors__: B * (K - 1) * N + A * K * N = + N * ((A + B) * K - B) + +__Xors per byte__: 8 * N * ((A + B) * K - B) / (K * B * Z) + +__CPU cycles per byte__: 8 * N * ((A + B) * K - B) / (K * B * Z * X) + + 8 * L * N / Z + (loads) + 8 * S * N / (K * Z) (stores) + +__Bytes per second__: G / {CPU cycles per byte} + +Some xors per byte numbers for specific configurations (B=8): + + Z=64 Z=128 Z=256 + K=2/R=1 0.79 0.39 0.20 + K=4/R=2 1.76 0.88 0.44 + K=4/R=3 2.06 1.03 0.51 + K=8/R=3 3.40 1.70 0.85 + K=8/R=4 3.71 1.86 0.93 + K=16/R=4 6.34 3.17 1.59 + + + +[1]: https://en.wikipedia.org/wiki/Erasure_code +[2]: https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction +[3]: https://en.wikipedia.org/wiki/Systematic_code +[4]: https://en.wikipedia.org/wiki/Identity_matrix +[5]: https://en.wikipedia.org/wiki/Linear_independence +[6]: https://en.wikipedia.org/wiki/Vandermonde_matrix +[7]: https://en.wikipedia.org/wiki/Finite_field +[8]: https://en.wikipedia.org/wiki/Finite_field_arithmetic#Implementation_tricks diff --git a/doc/developer-guide/fuse-interrupt.md b/doc/developer-guide/fuse-interrupt.md new file mode 100644 index 00000000000..ec991b81ec5 --- /dev/null +++ b/doc/developer-guide/fuse-interrupt.md @@ -0,0 +1,211 @@ +# Fuse interrupt handling + +## Conventions followed + +- *FUSE* refers to the "wire protocol" between kernel and userspace and + related specifications. +- *fuse* refers to the kernel subsystem and also to the GlusterFs translator. + +## FUSE interrupt handling spec + +The [Linux kernel FUSE documentation](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/fuse.txt?h=v4.18#n148) +desrcibes how interrupt handling happens in fuse. + +## Interrupt handling in the fuse translator + +### Declarations + +This document describes the internal API in the fuse translator with which +interrupt can be handled. + +The API being internal (to be used only in fuse-bridge.c; the functions are +not exported to a header file). + +``` +enum fuse_interrupt_state { + /* ... */ + INTERRUPT_SQUELCHED, + INTERRUPT_HANDLED, + /* ... */ +}; +typedef enum fuse_interrupt_state fuse_interrupt_state_t; +struct fuse_interrupt_record; +typedef struct fuse_interrupt_record fuse_interrupt_record_t; +typedef void (*fuse_interrupt_handler_t)(xlator_t *this, + fuse_interrupt_record_t *); +struct fuse_interrupt_record { + fuse_in_header_t fuse_in_header; + void *data; + /* + ... + */ +}; + +fuse_interrupt_record_t * +fuse_interrupt_record_new(fuse_in_header_t *finh, + fuse_interrupt_handler_t handler); + +void +fuse_interrupt_record_insert(xlator_t *this, fuse_interrupt_record_t *fir); + +gf_boolean_t +fuse_interrupt_finish_fop(call_frame_t *frame, xlator_t *this, + gf_boolean_t sync, void **datap); + +void +fuse_interrupt_finish_interrupt(xlator_t *this, fuse_interrupt_record_t *fir, + fuse_interrupt_state_t intstat, + gf_boolean_t sync, void **datap); +``` + +The code demonstrates the usage of the API through `fuse_flush()`. (It's a +dummy implementation only for demonstration purposes.) Flush is chosen +because a `FLUSH` interrupt is easy to trigger (see +*tests/features/interrupt.t*). Interrupt handling for flush is switched on +by `--fuse-flush-handle-interrupt` (a hidden glusterfs command line flag). +The implementation of flush interrupt is contained in the +`fuse_flush_interrupt_handler()` function and blocks guarded by the + +``` +if (priv->flush_handle_interrupt) { ... +``` + +conditional (where `priv` is a `*fuse_private_t`). + +### Overview + +"Regular" fuse fops and interrupt handlers interact via a list containing +interrupt records. + +If a fop wishes to have its interrupts handled, it needs to set up an +interrupt record and insert it into the list; also when it's to finish +(ie. in its "cbk" stage) it needs to delete the record from the list. + +If no interrupt happens, basically that's all to it - a list insertion +and deletion. + +However, if an interrupt comes for the fop, the interrupt FUSE request +will carry the data identifying an ongoing fop (that is, its `unique`), +and based on that, the interrupt record will be looked up in the list, and +the specific interrupt handler (a member of the interrupt record) will be +called. + +Usually the fop needs to share some data with the interrupt handler to +enable it to perform its task (also shared via the interrupt record). +The interrupt API offers two approaches to manage shared data: +- _Async or reference-counting strategy_: from the point on when the interrupt + record is inserted to the list, it's owned jointly by the regular fop and + the prospective interrupt handler. Both of them need to check before they + return if the other is still holding a reference; if not, then they are + responsible for reclaiming the shared data. +- _Sync or borrow strategy_: the interrupt handler is considered a borrower + of the shared data. The interrupt handler should not reclaim the shared + data. The fop will wait for the interrupt handler to finish (ie., the borrow + to be returned), then it has to reclaim the shared data. + +The user of the interrupt API need to call the following functions to +instrument this control flow: +- `fuse_interrupt_record_insert()` in the fop to insert the interrupt record to + the list; +- `fuse_interrupt_finish_fop()`in the fop (cbk) and +- `fuse_interrupt_finish_interrupt()`in the interrupt handler + +to perform needed synchronization at the end their tenure. The data management +strategies are implemented by the `fuse_interrupt_finish_*()` functions (which +have an argument to specify which strategy to use); these routines take care +of freeing the interrupt record itself, while the reclamation of the shared data +is left to the API user. + +### Usage + +A given FUSE fop can be enabled to handle interrupts via the following +steps: + +- Define a handler function (of type `fuse_interrupt_handler_t`). + It should implement the interrupt handling logic and in the end + call (directly or as async callback) `fuse_interrupt_finish_interrupt()`. + The `intstat` argument to `fuse_interrupt_finish_interrupt` should be + either `INTERRUPT_SQUELCHED` or `INTERRUPT_HANDLED`. + - `INTERRUPT_SQUELCHED` means that the interrupt could not be delivered + and the fop is going on uninterrupted. + - `INTERRUPT_HANDLED` means that the interrupt was actually handled. In + this case the fop will be answered from interrupt context with errno + `EINTR` (that is, the fop should not send a response to the kernel). + + (the enum `fuse_interrupt_state` includes further members, which are reserved + for internal use). + + We return to the `sync` and `datap` arguments later. +- In the `fuse_<FOP>` function create an interrupt record using + `fuse_interrupt_record_new()`, passing the incoming `fuse_in_header` and + the above handler function to it. + - Arbitrary further data can be referred to via the `data` member of the + interrupt record that is to be passed on from fop context to + interrupt context. +- When it's set up, pass the interrupt record to + `fuse_interrupt_record_insert()`. +- In `fuse_<FOP>_cbk` call `fuse_interrupt_finish_fop()`. + - `fuse_interrupt_finish_fop()` returns a Boolean according to whether the + interrupt was handled. If it was, then the FUSE request is already + answered and the stack gets destroyed in `fuse_interrupt_finish_fop` so + `fuse_<FOP>_cbk()` can just return (zero). Otherwise follow the standard + cbk logic (answer the FUSE request and destroy the stack -- these are + typically accomplished by `fuse_err_cbk()`). +- The last two argument of `fuse_interrupt_finish_fop()` and + `fuse_interrupt_finish_interrupt()` are `gf_boolean_t sync` and + `void **datap`. + - `sync` represents the strategy for freeing the interrupt record. The + interrupt handler and the fop handler are in race to get at the interrupt + record first (interrupt handler for purposes of doing the interrupt + handling, fop handler for purposes of deactivating the interrupt record + upon completion of the fop handling). + - If `sync` is true, then the fop handler will wait for the interrupt + handler to finish and it takes care of freeing. + - If `sync` is false, the loser of the above race will perform freeing. + + Freeing is done within the respective interrupt finish routines, except + for the `data` field of the interrupt record; with respect to that, see + the discussion of the `datap` parameter below. The strategy has to be + consensual, that is, `fuse_interrupt_finish_fop()` and + `fuse_interrupt_finish_interrupt()` must pass the same value for `sync`. + If dismantling the resources associated with the interrupt record is + simple, `sync = _gf_false` is the suggested choice; `sync = _gf_true` can + be useful in the opposite case, when dismantling those resources would + be inconvenient to implement in two places or to enact in non-fop context. + - If `datap` is `NULL`, the `data` member of the interrupt record will be + freed within the interrupt finish routine. If it points to a valid + `void *` pointer, and if caller is doing the cleanup (see `sync` above), + then that pointer will be directed to the `data` member of the interrupt + record and it's up to the caller what it's doing with it. + - If `sync` is true, interrupt handler can use `datap = NULL`, and + fop handler will have `datap` point to a valid pointer. + - If `sync` is false, and handlers pass a pointer to a pointer for + `datap`, they should check if the pointed pointer is NULL before + attempting to deal with the data. + +### FUSE answer for the interrupted fop + +The kernel acknowledges a successful interruption for a given FUSE request +if the filesystem daemon answers it with errno EINTR; upon that, the syscall +which induced the request will be abruptly terminated with an interrupt, rather +than returning a value. + +In glusterfs, this can be arranged in two ways. + +- If the interrupt handler wins the race for the interrupt record, ie. + `fuse_interrupt_finish_fop()` returns true to `fuse_<FOP>_cbk()`, then, as + said above, `fuse_<FOP>_cbk()` does not need to answer the FUSE request. + That's because then the interrupt handler will take care about answering + it (with errno EINTR). +- If `fuse_interrupt_finish_fop()` returns false to `fuse_<FOP>_cbk()`, then + this return value does not inform the fop handler whether there was an interrupt + or not. This return value occurs both when fop handler won the race for the + interrupt record against the interrupt handler, and when there was no interrupt + at all. + + However, the internal logic of the fop handler might detect from other + circumstances that an interrupt was delivered. For example, the fop handler + might be sleeping, waiting for some data to arrive, so that a premature + wakeup (with no data present) occurs if the interrupt handler intervenes. In + such cases it's the responsibility of the fop handler to reply the FUSE + request with errro EINTR. diff --git a/doc/developer-guide/identifying-resource-leaks.md b/doc/developer-guide/identifying-resource-leaks.md new file mode 100644 index 00000000000..950cae79b0a --- /dev/null +++ b/doc/developer-guide/identifying-resource-leaks.md @@ -0,0 +1,200 @@ +# Identifying Resource Leaks + +Like most other pieces of software, GlusterFS is not perfect in how it manages +its resources like memory, threads and the like. Gluster developers try hard to +prevent leaking resources but releasing and unallocating the used structures. +Unfortunately every now and then some resource leaks are unintentionally added. + +This document tries to explain a few helpful tricks to identify resource leaks +so that they can be addressed. + + +## Debug Builds + +There are certain techniques used in GlusterFS that make it difficult to use +tools like Valgrind for memory leak detection. There are some build options +that make it more practical to use Valgrind and other tools. When running +Valgrind, it is important to have GlusterFS builds that contain the +debuginfo/symbols. Some distributions (try to) strip the debuginfo to get +smaller executables. Fedora and RHEL based distributions have sub-packages +called ...-debuginfo that need to be installed for symbol resolving. + + +### Memory Pools + +By using memory pools, there are no allocation/freeing of single structures +needed. This improves performance, but also makes it impossible to track the +allocation and freeing of srtuctures. + +It is possible to disable the use of memory pools, and use standard `malloc()` +and `free()` functions provided by the C library. Valgrind is then able to +track the allocated areas and verify if they have been free'd. In order to +disable memory pools, the Gluster sources needs to be configured with the +`--enable-debug` option: + +```shell +./configure --enable-debug +``` + +When building RPMs, the `.spec` handles the `--with=debug` option too: + +```shell +make dist +rpmbuild -ta --with=debug glusterfs-....tar.gz +``` + +### Dynamically Loaded xlators + +Valgrind tracks the call chain of functions that do memory allocations. The +addresses of the functions are stored and before Valgrind exits the addresses +are resolved into human readable function names and offsets (line numbers in +source files). Because Gluster loads xlators dynamically, and unloads then +before exiting, Valgrind is not able to resolve the function addresses into +symbols anymore. Whenever this happend, Valgrind shows `???` in the output, +like + +``` + ==25170== 344 bytes in 1 blocks are definitely lost in loss record 233 of 324 + ==25170== at 0x4C29975: calloc (vg_replace_malloc.c:711) + ==25170== by 0x52C7C0B: __gf_calloc (mem-pool.c:117) + ==25170== by 0x12B0638A: ??? + ==25170== by 0x528FCE6: __xlator_init (xlator.c:472) + ==25170== by 0x528FE16: xlator_init (xlator.c:498) + ... +``` + +These `???` can be prevented by not calling `dlclose()` for unloading the +xlator. This will cause a small leak of the handle that was returned with +`dlopen()`, but for improved debugging this can be acceptible. For this and +other Valgrind features, a `--enable-valgrind` option is available to +`./configure`. When GlusterFS is built with this option, Valgrind will be able +to resolve the symbol names of the functions that do memory allocations inside +xlators. + +```shell +./configure --enable-valgrind +``` + +When building RPMs, the `.spec` handles the `--with=valgrind` option too: + +```shell +make dist +rpmbuild -ta --with=valgrind glusterfs-....tar.gz +``` + +## Running Valgrind against a single xlator + +Debugging a single xlator is not trivial. But there are some tools to make it +easier. The `sink` xlator does not do any memory allocations itself, but +contains just enough functionality to mount a volume with only the `sink` +xlator. There is a little gfapi application under `tests/basic/gfapi/` in the +GlusterFS sources that can be used to run only gfapi and the core GlusterFS +infrastructure with the `sink` xlator. By extending the `.vol` file to load +more xlators, each xlator can be debugged pretty much separately (as long as +the xlators have no dependencies on each other). A basic Valgrind run with the +suitable configure options looks like this: + +```shell +./autogen.sh +./configure --enable-debug --enable-valgrind +make && make install +cd tests/basic/gfapi/ +make gfapi-load-volfile +valgrind ./gfapi-load-volfile sink.vol +``` + +Combined with other very useful options to Valgrind, the following execution +shows many more useful details: + +```shell +valgrind \ + --fullpath-after= --leak-check=full --show-leak-kinds=all \ + ./gfapi-load-volfile sink.vol +``` + +Note that the `--fullpath-after=` option is left empty, this makes Valgrind +print the full path and filename that contains the functions: + +``` +==2450== 80 bytes in 1 blocks are definitely lost in loss record 8 of 60 +==2450== at 0x4C29975: calloc (/builddir/build/BUILD/valgrind-3.11.0/coregrind/m_replacemalloc/vg_replace_malloc.c:711) +==2450== by 0x52C6F73: __gf_calloc (/usr/src/debug/glusterfs-3.11dev/libglusterfs/src/mem-pool.c:117) +==2450== by 0x12F10CDA: init (/usr/src/debug/glusterfs-3.11dev/xlators/meta/src/meta.c:231) +==2450== by 0x528EFD5: __xlator_init (/usr/src/debug/glusterfs-3.11dev/libglusterfs/src/xlator.c:472) +==2450== by 0x528F105: xlator_init (/usr/src/debug/glusterfs-3.11dev/libglusterfs/src/xlator.c:498) +==2450== by 0x52D9D8B: glusterfs_graph_init (/usr/src/debug/glusterfs-3.11dev/libglusterfs/src/graph.c:321) +... +``` + +In the above example, the `init` function in `xlators/meta/src/meta.c` does a +memory allocation on line 231. This memory is never free'd again, and hence +Valgrind logs this call stack. When looking in the code, it seems that the +allocation of `priv` is assigned to the `this->private` member of the +`xlator_t` structure. Because the allocation is done in `init()`, free'ing is +expected to happen in `fini()`. Both functions are shown below, with the +inclusion of the empty `fini()`: + + +``` +226 int +227 init (xlator_t *this) +228 { +229 meta_priv_t *priv = NULL; +230 +231 priv = GF_CALLOC (sizeof(*priv), 1, gf_meta_mt_priv_t); +232 if (!priv) +233 return -1; +234 +235 GF_OPTION_INIT ("meta-dir-name", priv->meta_dir_name, str, out); +236 +237 this->private = priv; +238 out: +239 return 0; +240 } +241 +242 +243 int +244 fini (xlator_t *this) +245 { +246 return 0; +247 } +``` + +In this case, the resource leak can be addressed by adding a single line to the +`fini()` function: + +``` +243 int +244 fini (xlator_t *this) +245 { +246 GF_FREE (this->private); +247 return 0; +248 } +``` + +Running the same Valgrind command and comparing the output will show that the +memory leak in `xlators/meta/src/meta.c:init` is not reported anymore. + +### Running DRD, the Valgrind thread error detector + +When configuring GlusterFS with: + +```shell +./configure --enable-valgrind +``` + +the default Valgrind tool (Memcheck) is enabled. But it's also possble to select +one of Memcheck or DRD by using: + +```shell +./configure --enable-valgrind=memcheck +``` + +or: + +```shell +./configure --enable-valgrind=drd +``` + +respectively. When using DRD, it's recommended to consult +https://valgrind.org/docs/manual/drd-manual.html before running. diff --git a/doc/developer-guide/logging-guidelines.md b/doc/developer-guide/logging-guidelines.md index 58adf944b67..0e6b2588535 100644 --- a/doc/developer-guide/logging-guidelines.md +++ b/doc/developer-guide/logging-guidelines.md @@ -62,7 +62,7 @@ There are 2 interfaces provided to log messages, headers (like the time stamp, dom, errnum etc.). The primary users of the above interfaces are, when printing the final graph, or printing the configuration when a process is about dump core or abort, or - printing the backtrace when a process recieves a critical signal + printing the backtrace when a process receives a critical signal - These interfaces should not be used outside the scope of the users above, unless you know what you are doing diff --git a/doc/developer-guide/network_compression.md b/doc/developer-guide/network_compression.md index 7327591ef63..1222a765276 100644 --- a/doc/developer-guide/network_compression.md +++ b/doc/developer-guide/network_compression.md @@ -1,9 +1,9 @@ -#On-Wire Compression + Decompression +# On-Wire Compression + Decompression The 'compression translator' compresses and decompresses data in-flight between client and bricks. -###Working +### Working When a writev call occurs, the client compresses the data before sending it to brick. On the brick, compressed data is decompressed. Similarly, when a readv call occurs, the brick compresses the data before sending it to client. On the @@ -19,7 +19,7 @@ During normal operation, this is the format of data sent over wire: The trailer contains the CRC32 checksum and length of original uncompressed data. This is used for validation. -###Usage +### Usage Turning on compression xlator: @@ -27,7 +27,7 @@ Turning on compression xlator: gluster volume set <vol_name> network.compression on ~~~ -###Configurable parameters (optional) +### Configurable parameters (optional) **Compression level** ~~~ @@ -35,10 +35,10 @@ gluster volume set <vol_name> network.compression.compression-level 8 ~~~ ~~~ -0 : no compression -1 : best speed -9 : best compression --1 : default compression + 0 : no compression + 1 : best speed + 9 : best compression +-1 : default compression ~~~ **Minimum file size** @@ -55,7 +55,7 @@ Other less frequently used parameters include `network.compression.mem-level` and `network.compression.window-size`. More details can about these options can be found by running `gluster volume set help` command. -###Known Issues and Limitations +### Known Issues and Limitations * Compression translator cannot work with striped volumes. * Mount point hangs when writing a file with write-behind xlator turned on. To @@ -65,7 +65,7 @@ set`performance.strict-write-ordering` to on. distribute volumes. This limitation is caused by AFR not being able to propagate xdata. This issue has been fixed in glusterfs versions > 3.5 -###TODO +### TODO Although zlib offers high compression ratio, it is very slow. We can make the translator pluggable to add support for other compression methods such as [lz4 compression](https://code.google.com/p/lz4/) diff --git a/doc/developer-guide/options-to-contribute.md b/doc/developer-guide/options-to-contribute.md new file mode 100644 index 00000000000..3f0d84e7645 --- /dev/null +++ b/doc/developer-guide/options-to-contribute.md @@ -0,0 +1,212 @@ +# A guide for contributors + +While you have gone through 'how to contribute' guides, if you are +not sure what to work on, but really want to help the project, you +have now landed on the right document :-) + +### Basic + +Instead of planning to fix **all** the below issues in one patch, +we recommend you to have a a constant, continuous flow of improvements +for the project. We recommend you to pick 1 file (or just few files) at +a time to address below issues. +Pick any `.c` (or `.h`) file, and you can send a patch which fixes **any** +of the below themes. Ideally, fix all such occurrences in the file, even +though, the reviewers would review even a single line change patch +from you. + +1. Check for variable definitions, and if there is an array definition, +which is very large at the top of the function, see if you can re-scope +the variable to relevant sections (if it helps). + +Most of the time, some of these arrays may be used for 'error' handling, +and it is possible to use them only in that scope. + +Reference: https://review.gluster.org/20846/ + + +2. Check for complete string initialization at the beginning of a function. +Ideally, there is no reason to initialize a string. Fix it across the file. + +Example: + +`char new_path_name[PATH_MAX] = {0};` to `char new_path_name[PATH_MAX];` + + +3. Change `calloc()` to `malloc()` wherever it makes sense. + +In a case of allocating a structures, where you expect certain (or most of) +variables to be 0 (or NULL), it makes sense to use calloc(). But otherwise, +there is an extra cost to `memset()` the whole object after allocating it. +While it is not a significant improvement in performance, code which gets +hit 1000s of times in a second, it would add some value. + +Reference: https://review.gluster.org/20878/ + + +4. You can consider using `snprintf()`, instead of `strncpy()` while dealing +with strings. + +strncpy() won't null terminate if the dest buffer isn't big enough; snprintf() +does. While most of the string operations in the code is on array, and larger +size than required, strncpy() does an extra copy of 0s at the end of +string till the size of the array. It makes sense to use `snprintf()`, +which doesn't suffer from that behavior. + +Also check the return value from snprintf() for buffer overflow and handle +accordingly + +Reference: https://review.gluster.org/20925/ + + +5. Now, pick a `.h` file, and see if a structure is very large, and see +if re-aligning them as per [coding-standard](./coding-standard.md) gives any size benefit, +if yes, go ahead and change it. Make sure you check all the structures +in the file for similar pattern. + +Reference: [Check this section](https://github.com/gluster/glusterfs/blob/master/doc/developer-guide/coding-standard.md#structure-members-should-be-aligned-based-on-the-padding-requirements + + +### If you are up for more :-) + +Good progress! Glad you are interested to know more. We are surely interested +in next level of contributions from you! + +#### Coverity + +Visit [Coverity Dashboard](https://scan.coverity.com/projects/gluster-glusterfs?tab=overview). + +Now, if the number of defect is not 0, you have an opportunity to contribute. + +You get all the detail on why the particular defect is mentioned there, and +most probable hint on how to fix it. Do it! + +Reference: https://review.gluster.org/21394/ + +Use the same reference Id (789278) as the patch, so we can capture it is in +single bugzilla. + +#### Clang-Scan + +Clang-Scan is a tool which scans the .c files and reports the possible issues, +similar to coverity, but a different tool. Over the years we have seen, they +both report very different set of issues, and hence there is a value in fixing it. + +GlusterFS project gets tested with clang-scan job every night, and the report is +posted in the [job details page](https://build.gluster.org/job/clang-scan/lastCompletedBuild/clangScanBuildBugs/). +As long as the number is not 0 in the report here, you have an opportunity to +contribute! Similar to coverity dashboard, click on 'Details' to find out the +reason behind that report, and send a patch. + +Reference: https://review.gluster.org/21025 + +Again, you can use reference Id (1622665) for these patches! + + +### I am good with programming, I would like to do more than above! + +#### Locked regions / Critical sections + +In the file you open, see if the lock is taken only to increment or decrement +a flag, counter etc. If yes, then recommend you to convert it to ATOMIC locks. +It is simple activity, but, if you know programing, you would know the benefit +here. + +NOTE: There may not always a possibility to do this! You may have to check +with developers first before going ahead. + +Reference: https://review.gluster.org/21221/ + + +#### ASan (address sanitizer) + +[The job](https://build.gluster.org/job/asan/) runs regression with asan builds, +and you can also run glusterfs with asan on your workload to identify the leaks. +If there are any leaks reported, feel free to check it, and send us patch. + +You can also run `valgrind` and let us know what it reports. + +Reference: https://review.gluster.org/21397 + + +#### Porting to different architecture + +This is something which we are not focusing right now, happy to collaborate! + +Reference: https://review.gluster.org/21276 + + +#### Fix 'TODO/FIXME' in codebase + +There are few cases of pending features, or pending validations, which are +pending from sometime. You can pick them in the given file, and choose to +fix it. + + +### I don't know C, but I am interested to contribute in some way! + +You are most welcome! Our community is open for your contribution! First thing +which comes to our mind is **documentation**. Next is, **testing** or validation. + +If you have some hardware, and want to run some performance comparisons with +different version, or options, and help us to tune better is also a great help. + + +#### Documentation + +1. We have some documentation in [glusterfs repo](../), go through these, and +see if you can help us to keep up-to-date. + +2. The https://docs.gluster.org is powered by https://github.com/gluster/glusterdocs +repo. You can check out the repo, and help in keeping that up-to-date. + +3. [Our website](https://gluster.org) is maintained by https://github.com/gluster/glusterweb +repo. Help us to keep this up-to-date, and add content there. + +4. Write blogs about Gluster, and your experience, and make world know little +more about Gluster, and your use-case, and how it helped to solve the problem. + + +#### Testing + +1. There is a regression test suite in glusterfs, which runs with every patch, and is +triggered by just running `./run-tests.sh` from the root of the project repo. + +You can add more test case to match your use-case, and send it as a patch, so you +can make sure all future patches in glusterfs would keep your usecase intact. + +2. [Glusto-Tests](https://github.com/gluster/glusto-tests): This is another testing +framework written for gluster, and makes use of clustered setup to test different +use-cases, and helps to validate many bugs. + + +#### Ansible + +Gluster Organization has rich set of ansible roles, which are actively maintained. +Feel free to check them out here - https://github.com/gluster/gluster-ansible + + +#### Monitoring + +We have prometheus repo, and are actively working on adding more metrics. Add what +you need @ https://github.com/gluster/gluster-prometheus + + +#### Health-Report + +This is a project, where at any given point in time, you want to run some set of +commands locally, and get an output to analyze the status, it can be added. +Contribute @ https://github.com/gluster/gluster-health-report + + +### All these C/bash/python is old-school, I want something in containers. + +We have something for you too :-) + +Please visit our https://github.com/gluster/gcs repo for checking how you can help, +and how gluster can help you in container world. + + +### Note + +For any queries, best way is to contact us through mailing-list, <mailto:gluster-devel@gluster.org> diff --git a/doc/developer-guide/rpc-for-glusterfs.new-versions.md b/doc/developer-guide/rpc-for-glusterfs.new-versions.md new file mode 100644 index 00000000000..e3da5efa4a2 --- /dev/null +++ b/doc/developer-guide/rpc-for-glusterfs.new-versions.md @@ -0,0 +1,32 @@ +# GlusterFS RPC program versions + +## Compatibility + +RPC layer of glusterfs is implemented with possible changes over the protocol layers in mind. If there are any changes in the FOPs from what is assumed to be client side, and whats in serverside, they are to be added as a separate program table. + +### Program tables and Versions + +A given RPC program has a specific Task, and Version along with actors belonging to the program. In any of the programs, if a new actor is added, it is very important to define one more program with different version, and then keep both, if both are supported. Or else, it is important to handle the 'handshake' properly. + +#### Server details + +More info on RPC program is at `rpc/rpc-lib/src/rpcsvc.h` and check for structure `rpcsvc_actor_t` and `struct rpcsvc_program`. For usage, check `xlators/protocol/server/src/server-rpc-fops.c` + +#### Client details + +For details on client structures check `rpc/rpc-lib/src/rpc-clnt.h` for `rpc_clnt_procedure_t` and `rpc_clnt_program_t`. For usage, check `xlators/protocol/client/src/client-rpc-fops.c` + +## Protocol + +A protocol is what is agreed between two parties. In glusterfs, a RPC protocol is defined as .x file, which then gets converted to .c/.h file using `rpcgen`. There are different protocols defined for communication between `xlators/mgmt/glusterd <==> glusterfsd`, `gluster CLI <==> glusterd`, and `client-protocol <==> server-protocol` + +Once a protocol is defined and a release is made with that protocol, make sure no one changes it. Any edits to a given structure there should be a new version of the structure, and also it should get used in new actor, and thus new program version. + +## Server and Client Handshake + +When a client succeeds to establish a connect (it can be any transport, socket, ib-verbs or unix), client sends a dump (GF_DUMP_DUMP) request to server, which will respond back with all the supported versions of the server RPC (the supported programs which are registered with `rpcsvc_program_register()`). + +A client which expects certain programs to be present in server, it should be taking care of looking for it in the handshake methods, and take appropriate action depending on what to do next. In general a compatibility issue should be handled at handshake level itself, thus we can clearly let user/admin know of any 'in-compatibilities'. +As a developer of GlusterFS protocol layer, one just has to make sure *never to make changes to existing program structures*, but they have to add new programs if required. New programs can have the same actors as present in existing, and also little more. Or it can even have same actor behave differently, take different parameter. + +If this is followed properly, there would be smooth upgrade / downgrade of versions. If not, technically, it is 100% guarantee of getting compatibility related issues. diff --git a/doc/developer-guide/syncop.md b/doc/developer-guide/syncop.md new file mode 100644 index 00000000000..bcc8bd08e01 --- /dev/null +++ b/doc/developer-guide/syncop.md @@ -0,0 +1,72 @@ +# syncop framework +A coroutines-based, cooperative multi-tasking framework. + +## Topics + +- Glossary +- Lifecycle of a synctask +- Existing usage + + +## Glossary + +### syncenv + +syncenv is an object that provides access to a pool of worker threads. +synctasks execute in a syncenv. + +### synctask + +synctask can be informally defined as a pair of function pointers, namely _the +call_ and _the callback_ (see syncop.h for more details). + + synctask_fn_t - 'the call' + synctask_cbk_t - 'the callback' + +synctask has two modes of operation, + +1. The calling thread waits for the synctask to complete. +2. The calling thread schedules the synctask and continues. + +synctask guarantees that the callback is called _after_ the call completes. + +### Lifecycle of a synctask + +A synctask could go into the following stages while in execution. + +- CREATED - On calling synctask_create/synctask_new. + +- RUNNABLE - synctask is queued in env->runq. + +- RUNNING - When one of syncenv's worker threads calls synctask_switch_to. + +- WAITING - When a synctask calls synctask_yield. + +- DONE - When a synctask has run to completion. + + + +-------------------------------+ + | CREATED | + +-------------------------------+ + | + | synctask_new/synctask_create + v + +-------------------------------+ + | RUNNABLE (in env->runq) | <+ + +-------------------------------+ | + | | + | synctask_switch_to | + v | + +------+ on task completion +-------------------------------+ | + | DONE | <-------------------- | RUNNING | | synctask_wake/wake + +------+ +-------------------------------+ | + | | + | synctask_yield/yield | + v | + +-------------------------------+ | + | WAITING (in env->waitq) | -+ + +-------------------------------+ + +Note: A synctask is not guaranteed to run on the same thread throughout its +lifetime. Every time a synctask yields, it is possible for it to run on a +different thread. diff --git a/doc/developer-guide/thread-naming.md b/doc/developer-guide/thread-naming.md new file mode 100644 index 00000000000..513140d4437 --- /dev/null +++ b/doc/developer-guide/thread-naming.md @@ -0,0 +1,104 @@ +Thread Naming +================ +Gluster processes spawn many threads; some threads are created by libglusterfs +library, while others are created by xlators. When gfapi library is used in an +application, some threads belong to the application and some are spawned by +gluster libraries. We also have features where n number of threads are spawned +to act as worker threads for same operation. + +In all the above cases, it is useful to be able to determine the list of threads +that exist in runtime. Naming threads when you create them is the easiest way to +provide that information to kernel so that it can then be queried by any means. + +How to name threads +------------------- +We have two wrapper functions in libglusterfs for creating threads. They take +name as an argument and set thread name after its creation. + +```C +gf_thread_create (pthread_t *thread, const pthread_attr_t *attr, + void *(*start_routine)(void *), void *arg, const char *name) +``` + +```C +gf_thread_create_detached (pthread_t *thread, + void *(*start_routine)(void *), void *arg, + const char *name) +``` + +As max name length for a thread in POSIX is only 16 characters including the +'\0' character, you have to be a little creative with naming. Also, it is +important that all Gluster threads have common prefix. Considering these +conditions, we have "glfs_" as prefix for all the threads created by these +wrapper functions. It is responsibility of the owner of thread to provide the +suffix part of the name. It does not have to be a descriptive name, as it has +only 10 letters to work with. However, it should be unique enough such that it +can be matched with a table which describes it. + +If n number of threads are spwaned to perform same function, it is must that the +threads are numbered. + +Table of thread names +--------------------- +Thread names don't have to be a descriptive; however, it should be unique enough +such that it can be matched with a table below without ambiguity. + +- bdaio - block device aio +- brfsscan - bit rot fs scanner +- brhevent - bit rot event handler +- brmon - bit rot monitor +- brosign - bit rot one shot signer +- brpobj - bit rot object processor +- brsproc - bit rot scrubber +- brssign - bit rot stub signer +- brswrker - bit rot worker +- clogc - changelog consumer +- clogcbki - changelog callback invoker +- clogd - changelog dispatcher +- clogecon - changelog reverse connection +- clogfsyn - changelog fsync +- cloghcon - changelog history consumer +- clogjan - changelog janitor +- clogpoll - changelog poller +- clogproc - changelog process +- clogro - changelog rollover +- ctrcomp - change time recorder compaction +- dhtdf - dht defrag task +- dhtdg - dht defrag start +- dhtfcnt - dht rebalance file counter +- ecshd - ec heal daemon +- epollN - epoll thread +- fdlwrker - fdl worker +- fusenoti - fuse notify +- fuseproc - fuse main thread +- gdhooks - glusterd hooks +- glfspoll - gfapi poller thread +- idxwrker - index worker +- iosdump - io stats dump +- iotwr - io thread worker +- jbrflush - jbr flush +- leasercl - lease recall +- memsweep - sweeper thread for mem pools +- nfsauth - nfs auth +- nfsnsm - nfs nsm +- nfsudp - nfs udp mount +- nlmmon - nfs nlm/nsm mon +- posixaio - posix aio +- posixfsy - posix fsync +- posixhc - posix heal +- posixjan - posix janitor +- posixrsv - posix reserve +- quiesce - quiesce dequeue +- rdmaAsyn - rdma async event handler +- rdmaehan - rdma completion handler +- rdmarcom - rdma receive completion handler +- rdmascom - rdma send completion handler +- rpcsvcrh - rpcsvc request handler +- scleanup - socket cleanup +- shdheal - self heal daemon +- sigwait - glusterfsd sigwaiter +- spoller - socket poller +- sprocN - syncop worker thread +- tbfclock - token bucket filter token generator thread +- timer - timer thread +- upreaper - upcall reaper diff --git a/doc/developer-guide/translator-development.md b/doc/developer-guide/translator-development.md index 3bf7e153354..f75935519f6 100644 --- a/doc/developer-guide/translator-development.md +++ b/doc/developer-guide/translator-development.md @@ -472,7 +472,7 @@ hello Now let's interrupt the process and see where we are. ``` -^C + Program received signal SIGINT, Interrupt. 0x0000003a0060b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 @@ -680,4 +680,4 @@ Original author's site: Gluster community site: - * [Translators](http://www.gluster.org/community/documentation/index.php/Translators) + * [Translators](https://docs.gluster.org/en/latest/Quick-Start-Guide/Architecture/#translators) diff --git a/doc/developer-guide/writing-a-cloudsync-plugin.md b/doc/developer-guide/writing-a-cloudsync-plugin.md new file mode 100644 index 00000000000..907860aaed8 --- /dev/null +++ b/doc/developer-guide/writing-a-cloudsync-plugin.md @@ -0,0 +1,164 @@ +## How to write your Cloudsync Plugin + +### Background + +Cloudsync translator is part of the archival feature in Gluster. This translator +does the retrieval/download part. Each cold file will be archived to a remote +storage (public or private cloud). On future access to the file, it will be +retrieved from the remote storage by Cloudsync translator. Each remote storage +would need a unique plugin. Cloudsync translator will load this plugin and +call the necessary plugin functions. + +Upload can be done by a script or program. There are some basic mandatory steps +for uploading the data. There is a sample script for crawl and upload given at +the end of this guide. + +### Necessary changes to create a plugin + +1. Define store_methods: + +* This structure is the container of basic functions that will be called by + cloudsync xlator. + + typedef struct store_methodds { + int (*fop_download) (call_frame_t *frame, void *config); + /* return type should be the store config */ + void *(*fop_init) (xlator_t *this); + int (*fop_reconfigure) (xlator_t *this, dict_t *options); + void (*fop_fini) (void *config); + } store_methods_t; + + + Member details: + fop_download: + This is the download function pointer. + + frame: This will have the fd to write data downloaded from + cloud to GlusterFS.(frame->local->fd) + + config: This is the plugin configuration variable. + + Note: Structure cs_local_t has member dlfd and dloffset which + can be used to manage the writes to Glusterfs. + Include cloudsync-common.h to access these structures. + + fop_init: + This is similar to xlator init. But here the return value is + the plugin configuration pointer. This pointer will be stored + in the cloudsync private object (priv->stores->config). And + the cloudsync private object can be accessed by "this->private" + where "this" is of type xlator_t. + + fop_reconfigure: + This is similar to xlator_reconfigure. + + fop_fini: + Free plugin resources. + + Note: Store_methods_t is part of cs_private_t which in turn part of + xlator_t. Create a store_methods_t object named "store_ops" in + your plugin. For example + + store_methods_t store_ops = { + .fop_download = aws_download_s3, + .fop_init = aws_init, + .fop_reconfigure = aws_reconfigure, + .fop_fini = aws_fini, + }; + + +2 - Making Cloudsync xlator aware of the plugin: + + Add an entry in to the cs_plugin structure. For example + struct cs_plugin plugins[] = { + { + .name = "amazons3", + .library = "libamazons3.so", + .description = "amazon s3 store." + }, + + {.name = NULL}, + }; + + Description about individual members: + name: name of the plugin + library: This is the shared object created. Cloudsync will load + this library during init. + description: Describe about the plugin. + +3- Makefile Changes in Cloudsync: + + Add <plugin.la> to cloudsync_la_LIBADD variable. + +4 - Configure.ac changes: + + In cloudsync section add the necessary dependency checks for + the plugin. + +5 - Export symbols: + + Cloudsync needs "store_ops" to resolve all plugin functions. + Create a file <plugin>.sym and add write "store_ops" to it. + + +### Sample script for upload +This script assumes amazon s3 is the target cloud and bucket name is +gluster-bucket. User can do necessary aws configuration using command +"aws configure". Currently for amazons3 there are four gluster settings +available. +1- features.s3plugin-seckey -> s3 secret key +2- features.s3plugin-keyid -> s3 key id +3- features.s3plugin-bucketid -> bucketid +4- features.s3plugin-hostname -> hostname e.g. s3.amazonaws.com + +Additionally set cloudsync storetype to amazons3. + +gluster v set <VOLNAME> cloudsync-storetype amazons3 + +Now create a mount dedicated for this upload task. + +That covers necessary configurations needed. + +Below is the sample script for upload. The script will crawl directly on the +brick and will upload those files which are not modified for last one month. +It needs two arguments. +1st arguement - Gluster Brick path +2nd arguement - coldness that is how many days since the file was modified. +3rd argument - dedicated gluster mount point created for uploading. + +Once the cloud setup is done, run the following script on individual bricks. +Note: For an AFR volume, pick only the fully synchronized brick among the +replica bricks. + +``` +target_folder=$1 +coldness=$2 +mnt=$3 + +cd $target_folder +for i in `find . -type f | grep -v "glusterfs" | sed 's/..//'` +do + echo "processing $mnt/$i" + + #check whether the file is already archived + getfattr -n trusted.glusterfs.cs.remote $i &> /dev/null + if [ $? -eq 0 ] + then + echo "file $mnt/$i is already archived" + else + #upload to cloud + aws s3 cp $mnt/$i s3://gluster-bucket/ + mtime=`stat -c "%Y" $mnt/$i` + + #post processing of upload + setfattr -n trusted.glusterfs.csou.complete -v $mtime $mnt/$i + if [ $? -ne 0 ] + then + echo "archiving of file $mnt/$i failed" + else + echo "archiving of file $mnt/$i succeeded" + fi + + fi +done +``` diff --git a/doc/developer-guide/xlator-classification.md b/doc/developer-guide/xlator-classification.md new file mode 100644 index 00000000000..6073df9375f --- /dev/null +++ b/doc/developer-guide/xlator-classification.md @@ -0,0 +1,221 @@ +# xlator categories and expectations + +The purpose of the document is to define a category for various xlators +and expectations around what each category means from a perspective of +health and maintenance of a xlator. + +The need to do this is to ensure certain categories are kept in good +health, and helps the community and contributors focus their efforts around the +same. + +This document also provides implementation details for xlator developers to +declare a category for any xlator. + +## Table of contents +1. Audience +2. Categories (and expectations of each category) +3. Implementation and usage details + +## Audience + +This document is intended for the following community participants, +- New xlator contributors +- Existing xlator maintainers +- Packaging and gluster management stack maintainers + +For a more user facing understanding it is recommended to read section (TBD) +in the gluster documentation. + +## Categories +1. Experimental (E) +2. TechPreview (TP) +3. Maintained (M) +4. Deprecated (D) +5. Obsolete (O) + +### Experimental (E) + +Developed in the experimental branch, for exploring new features. These xlators +are NEVER packaged as a part of releases, interested users and contributors can +build and work with these from sources. In the future, these maybe available as +an package based on a weekly build of the same. + +#### Quality expectations +- Compiles or passes smoke tests +- Does not break nightly experimental regressions + - NOTE: If a nightly is broken, then all patches that were merged are reverted + till the errant patch is found and subsequently fixed + +### TechPreview (TP) + +Xlators in master or release branches that are not deemed fit to be in +production deployments, but are feature complete to invite feedback and host +user data. + +These xlators will be worked upon with priority by maintainers/authors who are +involved in making them more stable than xlators in the Experimental/Deprecated/ +Obsolete categories. + +There is no guarantee that these xlators will move to the Maintained state, and +may just get Obsoleted based on feedback, or other project goals or technical +alternatives. + +#### Quality expectations +- Same as Maintained, minus + - Performance, Scale, other(?) + - *TBD* *NOTE* Need inputs, Intention is all quality goals as in Maintained, + other than the list above (which for now has scale and performance) + +### Maintained (M) + +These xltors are part of the core Gluster functionality and are maintained +actively. These are part of master and release branches and are higher in +priority of maintainers and other interested contributors. + +#### Quality expectations + +NOTE: A short note on what each of these mean are added here, details to follow. + +NOTE: Out of the gate all of the following are not mandated, consider the +following a desirable state to reach as we progress on each + +- Bug backlog: Actively address bug backlog +- Enhancement backlog: Actively maintain outstanding enhancement backlog (need + not be acted on, but should be visible to all) +- Review backlog: Actively keep this below desired counts and states +- Static code health: Actively meet near-zero issues in this regard + - Coverity, spellcheck and other checks +- Runtime code health: Actively meet defined coverage levels in this regard + - Coverage, others? + - Per-patch regressions + - Glusto runs + - Performance + - Scalability +- Technical specifications: Implementation details should be documented and + updated at regular cadence (even per patch that change assumptions in + here) +- User documentation: User facing details should be maintained to current + status in the documentation +- Debuggability: Steps, tools, procedures should be documented and maintained + each release/patch as applicable +- Troubleshooting: Steps, tools, procedures should be documented and maintained + each release/patch as applicable + - Steps/guides for self service + - Knowledge base for problems +- Other common criteria that will apply: Required metrics/desired states to be + defined per criteria + - Monitoring, usability, statedump, and other such xlator expectations + +### Deprecated (D) + +Xlators on master or release branches that would be obsoleted and/or replaced +with similar or other functionality in the next major release. + +#### Quality expectations +- Retain status-quo when moved to this state, till it is moved to obsoleted +- Provide migration steps if feature provided by the xlator is replaced with +other xlators + +### Obsolete (O) + +Xlator/code still in tree, but not packaged or shipped or maintained in any +form. This is noted as a category till the code is removed from the tree. + +These xlators and their corresponding code and test health will not be executed. + +#### Quality expectations +- None + +## Implementation and usage details + +### How to specify an xlators category + +While defining 'xlator_api_t' structure for the corresponding xlator, add a +flag like below: + +``` +diff --git a/xlators/performance/nl-cache/src/nl-cache.c b/xlators/performance/nl-cache/src/nl-cache.c +index 0f0e53bac2..8267d6897c 100644 +--- a/xlators/performance/nl-cache/src/nl-cache.c ++++ b/xlators/performance/nl-cache/src/nl-cache.c +@@ -869,4 +869,5 @@ xlator_api_t xlator_api = { + .cbks = &nlc_cbks, + .options = nlc_options, + .identifier = "nl-cache", ++ .category = GF_TECH_PREVIEW, + }; +diff --git a/xlators/performance/quick-read/src/quick-read.c b/xlators/performance/quick-read/src/quick-read.c +index 8d39720e7f..235de27c19 100644 +--- a/xlators/performance/quick-read/src/quick-read.c ++++ b/xlators/performance/quick-read/src/quick-read.c +@@ -1702,4 +1702,5 @@ xlator_api_t xlator_api = { + .cbks = &qr_cbks, + .options = qr_options, + .identifier = "quick-read", ++ .category = GF_MAINTAINED, + }; +``` + +Similarly, if a particular option is in different state other than +the xlator state, one can add the same flag in options structure too. + +``` +diff --git a/xlators/cluster/afr/src/afr.c b/xlators/cluster/afr/src/afr.c +index 0e86e33d03..81996743d1 100644 +--- a/xlators/cluster/afr/src/afr.c ++++ b/xlators/cluster/afr/src/afr.c +@@ -772,6 +772,7 @@ struct volume_options options[] = { + .description = "Maximum latency for shd halo replication in msec." + }, + { .key = {"halo-enabled"}, ++ .category = GF_TECH_PREVIEW, + .type = GF_OPTION_TYPE_BOOL, + .default_value = "False", + +``` + + +### User experience using the categories + +#### Ability to use a category + +This section details which category of xlators can be used when and specifics +around when each category is enabled. + +1. Maintained category xlators can be used by default, this implies, volumes +created with these xlators enabled will throw no warnings, or need no user +intervention to use the xlator. + +2. Tech Preview category xlators needs cluster configuration changes to allow +these xlatorss to be used in volumes, further, logs will contain a message +stating TP xlators are in use. Without the cluster configured to allow TP +xlators, volumes created or edited to use such xlators would result in errors. + - (TBD) Cluster configuration option + - (TBD) Warning message + - (TBD) Code mechanics on how this is achieved + +3. Deprecated category xlators can be used by default, but will throw a warning +in the logs that such are in use and will be deprecated in the future. + - (TBD) Warning message + +4. Obsolete category xlators will not be packaged and hence cannot be used from +release builds. + +5. Experimental category xlators will not be packaged and hence cannot be used +from release builds, if running experimental (weekly or other such) builds, +these will throw a warning in the logs stating experimental xlators are in use. + - (TBD) Warning message + +#### Ability to query xlator category + +(TBD) Need to provide the ability to query xlator categories, or list xlators +and their respective categories. + +#### User facing changes + +User facing changes that are expected due to this change include the following, +- Cluster wide option to enable TP xlators, or more generically a category +level of xlators +- Errors in commands that fail due to invalid categories +- Warning messages in logs to denote certain categories of xlators are in use +- (TBD) Ability to query xlators and their respective categories diff --git a/doc/features/ctime.md b/doc/features/ctime.md new file mode 100644 index 00000000000..74a77abed4b --- /dev/null +++ b/doc/features/ctime.md @@ -0,0 +1,68 @@ +# Consistent time attributes in gluster across replica/distribute + + +#### Problem: +Traditionally gluster has been using time attributes (ctime, atime, mtime) of files/dirs from bricks. The problem with this approach is that, it is not consisteant across replica and distribute bricks. And applications which depend on it breaks as replica might not always return time attributes from same brick. + +Tar especially gives "file changed as we read it" whenever it detects ctime differences when stat is served from different bricks. The way we have been trying to solve it is to serve the stat structures from same brick in afr, max-time in dht. But it doesn't avoid the problem completely. Because there is no way to change ctime at the moment(lutimes() only allows mtime, atime), there is little we can do to make sure ctimes match after self-heals/xattr updates/rebalance. + +#### Solution Proposed: +Store time attribues (ctime, mtime, atime) as an xattr of the file. The xattr is updated based +on the fop. If a filesystem fop changes only mtime and ctime, update only those in xattr for +that file. + +#### Design Overview: +1) As part of each fop, top layer will generate a time stamp and pass it to the down along + with other information + - This will bring a dependency for NTP synced clients along with servers + - There can be a diff in time if the fop stuck in the xlator for various reason, +for ex: because of locks. + + 2) On the server, posix layer stores the value in the memory (inode ctx) and will sync the data periodically to the disk as an extended attr + - Of course sync call also will force it. And fop comes for an inode which is not linked, we do the sync immediately. + + 3) Each time when inodes are created or initialized it read the data from disk and store in inode ctx. + + 4) Before setting to inode_ctx we compare the timestamp stored and the timestamp received, and only store if the stored value is lesser than the current value. + + 5) So in best case data will be stored and retrieved from the memory. We replace the values in iatt with the values in inode_ctx. + + 6) File ops that changes the parent directory attr time need to be consistent across all the distributed directories across the subvolumes. (for eg: a create call will change ctime and mtime of parent dir) + + - This has to handle separately because we only send the fop to the hashed subvolume. + - We can asynchronously send the timeupdate setattr fop to the other subvoumes and change the values for parent directory if the file fops is successful on hashed subvolume. + - This will have a window where the times are inconsistent across dht subvolume (Please provide your suggestions) + +7) Currently we have couple of mount options for time attributes like noatime, relatime , nodiratime etc. But we are not explicitly handled those options even if it is given as mount option when gluster mount. + + +#### Implementation Overview: +This features involves changes in following xlators. + - utime xlator + - posix xlator + +##### utime xlator: +This is a new client side xlator which does following tasks. + +1. It will generate a time stamp and passes it down in frame->root->ctime and over the network. +2. Based on fop, it also decides the time attributes to be updated and this passed using "frame->root->flags" + + Patches: + 1. https://review.gluster.org/#/c/19857/ + +##### posix xlator: +Following tasks are done in posix xlator: + +1. Provides APIs to set and get the xattr from backend. It also caches the xattr in inode context. During get, it updates time attributes stored in xattr into iatt structure. +2. Based on the flags from utime xlator, relevant fops update the time attributes in the xattr. + + Patches: + 1. https://review.gluster.org/#/c/19267/ + 2. https://review.gluster.org/#/c/19795/ + 3. https://review.gluster.org/#/c/19796/ + +#### Pending Work: +1. Handling of time related mount options (noatime, realatime,etc) +2. flag based create (depending on flags in open, create behaviour might change) +3. Changes in dht for direcotory sync acrosss multiple subvolumes +4. readdirp stat need to be worked on. diff --git a/doc/gluster.8 b/doc/gluster.8 index 9780264d537..ba595edca15 100644 --- a/doc/gluster.8 +++ b/doc/gluster.8 @@ -16,15 +16,14 @@ gluster - Gluster Console Manager (command line utility) .PP To run the program and display gluster prompt: .PP -.B gluster [--xml] +.B gluster [--remote-host=<gluster_node>] [--mode=script] [--xml] .PP (or) .PP To specify a command directly: .PP .B gluster -.I [commands] [options] [--xml] - +.I [commands] [options] [--remote-host=<gluster_node>] [--mode=script] [--xml] .SH DESCRIPTION The Gluster Console Manager is a command line utility for elastic volume management. You can run the gluster command on any export server. The command enables administrators to perform cloud operations, such as creating, expanding, shrinking, rebalancing, and migrating volumes without needing to schedule server downtime. .SH COMMANDS @@ -36,7 +35,13 @@ The Gluster Console Manager is a command line utility for elastic volume managem \fB\ volume info [all|<VOLNAME>] \fR Display information about all volumes, or the specified volume. .TP -\fB\ volume create <NEW-VOLNAME> [stripe <COUNT>] [replica <COUNT>] [disperse [<COUNT>]] [redundancy <COUNT>] [transport <tcp|rdma|tcp,rdma>] <NEW-BRICK> ... \fR +\fB\ volume list \fR +List all volumes in cluster +.TP +\fB\ volume status [all | <VOLNAME> [nfs|shd|<BRICK>|quotad]] [detail|clients|mem|inode|fd|callpool|tasks|client-list] \fR +Display status of all or specified volume(s)/brick +.TP +\fB\ volume create <NEW-VOLNAME> [stripe <COUNT>] [[replica <COUNT> [arbiter <COUNT>]]|[replica 2 thin-arbiter 1]] [disperse [<COUNT>]] [disperse-data <COUNT>] [redundancy <COUNT>] [transport <tcp|rdma|tcp,rdma>] <NEW-BRICK> ... <TA-BRICK> \fR Create a new volume of the specified type using the specified bricks and transport type (the default transport type is tcp). To create a volume with both transports (tcp and rdma), give 'transport tcp,rdma' as an option. .TP @@ -52,8 +57,17 @@ Stop the specified volume. \fB\ volume set <VOLNAME> <OPTION> <PARAMETER> [<OPTION> <PARAMETER>] ... \fR Set the volume options. .TP -\fB\ volume get <VOLNAME> <OPTION/all>\fR -Get the volume options. +\fB\ volume get <VOLNAME/all> <OPTION/all> \fR +Get the value of the all options or given option for volume <VOLNAME> or all option. gluster volume get all all is to get all global options +.TP +\fB\ volume reset <VOLNAME> [option] [force] \fR +Reset all the reconfigured options +.TP +\fB\ volume barrier <VOLNAME> {enable|disable} \fR +Barrier/unbarrier file operations on a volume +.TP +\fB\ volume clear-locks <VOLNAME> <path> kind {blocked|granted|all}{inode [range]|entry [basename]|posix [range]} \fR +Clear locks held on path .TP \fB\ volume help \fR Display help for the volume command. @@ -71,6 +85,9 @@ If you remove the brick, the data stored in that brick will not be available. Yo .B replace-brick option. .TP +\fB\ volume reset-brick <VOLNAME> <SOURCE-BRICK> {{start} | {<NEW-BRICK> commit}} \fR +Brings down or replaces the specified source brick with the new brick. +.TP \fB\ volume replace-brick <VOLNAME> <SOURCE-BRICK> <NEW-BRICK> commit force \fR Replace the specified source brick with a new brick. .TP @@ -92,6 +109,18 @@ Locate the log file for corresponding volume/brick. .TP \fB\ volume log rotate <VOLNAME> [BRICK] \fB Rotate the log file for corresponding volume/brick. +.TP +\fB\ volume profile <VOLNAME> {start|info [peek|incremental [peek]|cumulative|clear]|stop} [nfs] \fR +Profile operations on the volume. Once started, volume profile <volname> info provides cumulative statistics of the FOPs performed. +.TP +\fB\ volume top <VOLNAME> {open|read|write|opendir|readdir|clear} [nfs|brick <brick>] [list-cnt <value>] | {read-perf|write-perf} [bs <size> count <count>] [brick <brick>] [list-cnt <value>] \fR +Generates a profile of a volume representing the performance and bottlenecks/hotspots of each brick. +.TP +\fB\ volume statedump <VOLNAME> [[nfs|quotad] [all|mem|iobuf|callpool|priv|fd|inode|history]... | [client <hostname:process-id>]] \fR +Dumps the in memory state of the specified process or the bricks of the volume. +.TP +\fB\ volume sync <HOSTNAME> [all|<VOLNAME>] \fR +Sync the volume information from a peer .SS "Peer Commands" .TP \fB\ peer probe <HOSTNAME> \fR @@ -103,27 +132,58 @@ Detach the specified peer. \fB\ peer status \fR Display the status of peers. .TP +\fB\ pool list \fR +List all the nodes in the pool (including localhost) +.TP \fB\ peer help \fR Display help for the peer command. -.SS "Tier Commands" +.SS "Quota Commands" +.TP +\fB\ volume quota <VOLNAME> enable \fR +Enable quota on the specified volume. This will cause all the directories in the filesystem hierarchy to be accounted and updated thereafter on each operation in the the filesystem. To kick start this accounting, a crawl is done over the hierarchy with an auxiliary client. +.TP +\fB\ volume quota <VOLNAME> disable \fR +Disable quota on the volume. This will disable enforcement and accounting in the filesystem. Any configured limits will be lost. .TP -\fB\ volume tier <VOLNAME> attach [<replica COUNT>] <NEW-BRICK>... \fR -Attach to an existing volume a tier of specified type using the specified bricks. +\fB\ volume quota <VOLNAME> limit-usage <PATH> <SIZE> [<PERCENT>] \fR +Set a usage limit on the given path. Any previously set limit is overridden to the new value. The soft limit can optionally be specified (as a percentage of hard limit). If soft limit percentage is not provided the default soft limit value for the volume is used to decide the soft limit. .TP -\fB\ volume tier <VOLNAME> status \fR -Display statistics on data migration between the hot and cold tiers. +\fB\ volume quota <VOLNAME> limit-objects <PATH> <SIZE> [<PERCENT>] \fR +Set an inode limit on the given path. Any previously set limit is overridden to the new value. The soft limit can optionally be specified (as a percentage of hard limit). If soft limit percentage is not provided the default soft limit value for the volume is used to decide the soft limit. .TP -\fB\ volume tier <VOLNAME> detach start\fR -Begin detaching the hot tier from the volume. Data will be moved from the hot tier to the cold tier. +NOTE: valid units of SIZE are : B, KB, MB, GB, TB, PB. If no unit is specified, the unit defaults to bytes. .TP -\fB\ volume tier <VOLNAME> detach commit [force]\fR -Commit detaching the hot tier from the volume. The volume will revert to its original state before the hot tier was attached. +\fB\ volume quota <VOLNAME> remove <PATH> \fR +Remove any usage limit configured on the specified directory. Note that if any limit is configured on the ancestors of this directory (previous directories along the path), they will still be honored and enforced. .TP -\fB\ volume tier <VOLNAME> detach status\fR -Check status of data movement from the hot to cold tier. +\fB\ volume quota <VOLNAME> remove-objects <PATH> \fR +Remove any inode limit configured on the specified directory. Note that if any limit is configured on the ancestors of this directory (previous directories along the path), they will still be honored and enforced. .TP -\fB\ volume tier <VOLNAME> detach stop\fR -Stop detaching the hot tier from the volume. +\fB\ volume quota <VOLNAME> list <PATH> \fR +Lists the usage and limits configured on directory(s). If a path is given only the limit that has been configured on the directory(if any) is displayed along with the directory's usage. If no path is given, usage and limits are displayed for all directories that has limits configured. +.TP +\fB\ volume quota <VOLNAME> list-objects <PATH> \fR +Lists the inode usage and inode limits configured on directory(s). If a path is given only the limit that has been configured on the directory(if any) is displayed along with the directory's inode usage. If no path is given, usage and limits are displayed for all directories that has limits configured. +.TP +\fB\ volume quota <VOLNAME> default-soft-limit <PERCENT> \fR +Set the percentage value for default soft limit for the volume. +.TP +\fB\ volume quota <VOLNAME> soft-timeout <TIME> \fR +Set the soft timeout for the volume. The interval in which limits are retested before the soft limit is breached. +.TP +\fB\ volume quota <VOLNAME> hard-timeout <TIME> \fR +Set the hard timeout for the volume. The interval in which limits are retested after the soft limit is breached. +.TP +\fB\ volume quota <VOLNAME> alert-time <TIME> \fR +Set the frequency in which warning messages need to be logged (in the brick logs) once soft limit is breached. +.TP +\fB\ volume inode-quota <VOLNAME> enable/disable \fR +Enable/disable inode-quota for <VOLNAME> +.TP +\fB\ volume quota help \fR +Display help for volume quota commands +.TP +NOTE: valid units of time and their symbols are : hours(h/hr), minutes(m/min), seconds(s/sec), weeks(w/wk), Days(d/days). .SS "Geo-replication Commands" .TP \fI\ Note\fR: password-less ssh, from the master node (where these commands are executed) to the slave node <SLAVE_HOST>, is a prerequisite for the geo-replication commands. @@ -131,8 +191,10 @@ Stop detaching the hot tier from the volume. \fB\ system:: execute gsec_create\fR Generates pem keys which are required for push-pem .TP -\fB\ volume geo-replication <MASTER_VOL> <SLAVE_HOST>::<SLAVE_VOL> create [push-pem] [force]\fR +\fB\ volume geo-replication <MASTER_VOL> <SLAVE_HOST>::<SLAVE_VOL> create [[ssh-port n][[no-verify]|[push-pem]]] [force]\fR Create a new geo-replication session from <MASTER_VOL> to <SLAVE_HOST> host machine having <SLAVE_VOL>. +Use ssh-port n if custom SSH port is configured in slave nodes. +Use no-verify if the rsa-keys of nodes in master volume is distributed to slave nodes through an external agent. Use push-pem to push the keys automatically. .TP \fB\ volume geo-replication <MASTER_VOL> <SLAVE_HOST>::<SLAVE_VOL> {start|stop} [force] \fR @@ -156,19 +218,25 @@ Use "!<OPTION>" to reset option <OPTION> to default value. \fB\ volume bitrot <VOLNAME> {enable|disable} \fR Enable/disable bitrot for volume <VOLNAME> .TP +\fB\ volume bitrot <VOLNAME> signing-time <time-in-secs> \fR +Waiting time for an object after last fd is closed to start signing process. +.TP +\fB\ volume bitrot <VOLNAME> signer-threads <count> \fR +Number of signing process threads. Usually set to number of available cores. +.TP \fB\ volume bitrot <VOLNAME> scrub-throttle {lazy|normal|aggressive} \fR Scrub-throttle value is a measure of how fast or slow the scrubber scrubs the filesystem for volume <VOLNAME> .TP -\fB\ volume bitrot <VOLNAME> scrub-frequency {daily|weekly|biweekly|monthly} \fR +\fB\ volume bitrot <VOLNAME> scrub-frequency {hourly|daily|weekly|biweekly|monthly} \fR Scrub frequency for volume <VOLNAME> .TP -\fB\ volume bitrot <VOLNAME> scrub {pause|resume} \fR -Pause/Resume scrub. Upon resume, scrubber continues where it left off. +\fB\ volume bitrot <VOLNAME> scrub {pause|resume|status|ondemand} \fR +Pause/Resume scrub. Upon resume, scrubber continues where it left off. status option shows the statistics of scrubber. ondemand option starts the scrubbing immediately if the scrubber is not paused or already running. +.TP +\fB\ volume bitrot help \fR +Display help for volume bitrot commands .TP -\fB\ volume bitrot <VOLNAME> scrub status \fR -Show the statistics of scrubber status .SS "Snapshot Commands" -.PP .TP \fB\ snapshot create <snapname> <volname> [no-timestamp] [description <description>] [force] \fR Creates a snapshot of a GlusterFS volume. User can provide a snap-name and a description to identify the snap. Snap will be created by appending timestamp in GMT. User can override this behaviour using "no-timestamp" option. The description cannot be more than 1024 characters. To be able to take a snapshot, volume should be present and it should be in started state. @@ -271,6 +339,9 @@ Selects <HOSTNAME:BRICKNAME> as the source for all the files that are in split-b Selects the split-brained <FILE> present in <HOSTNAME:BRICKNAME> as source and completes heal. .SS "Other Commands" .TP +\fB\ get-state [<daemon>] [[odir </path/to/output/dir/>] [file <filename>]] [detail|volumeoptions] \fR +Get local state representation of mentioned daemon and store data in provided path information +.TP \fB\ help \fR Display the command options. .TP diff --git a/doc/glusterd.8 b/doc/glusterd.8 index 04a43481eec..e3768c78761 100644 --- a/doc/glusterd.8 +++ b/doc/glusterd.8 @@ -30,6 +30,9 @@ File to use for logging. \fB\-L <LOGLEVEL>, \fB\-\-log\-level=<LOGLEVEL>\fR Logging severity. Valid options are TRACE, DEBUG, INFO, WARNING, ERROR and CRITICAL (the default is INFO). .TP +\fB\-\-localtime\-logging\fR +Enable localtime log timestamps. +.TP \fB\-\-debug\fR Run the program in debug mode. This option sets \fB\-\-no\-daemon\fR, \fB\-\-log\-level\fR to DEBUG and \fB\-\-log\-file\fR to console. diff --git a/doc/glusterfs.8 b/doc/glusterfs.8 index fc28ef68be6..3d359ea85e4 100644 --- a/doc/glusterfs.8 +++ b/doc/glusterfs.8 @@ -53,6 +53,9 @@ Maximum number of connect attempts to server. This option should be provided wit \fB\-\-acl\fR Mount the filesystem with POSIX ACL support. .TP +\fB\-\-localtime\-logging\fR +Enable localtime log timestamps. +.TP \fB\-\-debug\fR Run in debug mode. This option sets \fB\-\-no\-daemon\fR, \fB\-\-log\-level\fR to DEBUG, and \fB\-\-log\-file\fR to console. @@ -60,8 +63,8 @@ and \fB\-\-log\-file\fR to console. \fB\-\-enable\-ino32=BOOL\fR Use 32-bit inodes when mounting to workaround application that doesn't support 64-bit inodes. .TP -\fB\-\-fopen\-keep\-cache\fR -Do not purge the cache on file open. +\fB\-\-fopen\-keep\-cache[=BOOL]\fR +Do not purge the cache on file open (default: false). .TP \fB\-\-mac\-compat=BOOL\fR Provide stubs for attributes needed for seamless operation on Macs (the default is off). @@ -98,11 +101,17 @@ Mount the filesystem in 'worm' mode. .TP \fB\-\-xlator\-option=VOLUME\-NAME.OPTION=VALUE\fR Add/Override a translator option for a volume with the specified value. +.TP +\fB\-\-subdir\-mount=SUBDIR\-MOUNT\-PATH\fR +Mount subdirectory instead of the '/' of volume. .SS "Fuse options" .PP .TP +\fB\-\-attr\-times\-granularity=NANOSECONDS\fR +Declare supported granularity of file attribute times (default is 0 which kernel handles as unspecified; valid real values are between 1 and 1000000000). +.TP \fB\-\-attribute\-timeout=SECONDS\fR Set attribute timeout to SECONDS for inodes in fuse kernel module (the default is 1). .TP @@ -112,8 +121,8 @@ Set fuse module's background queue length to N (the default is 64). \fB\-\-congestion\-threshold=N\fR Set fuse module's congestion threshold to N (the default is 48). .TP -\fB\-\-direct\-io\-mode=BOOL\fR -Enable/Disable the direct-I/O mode in fuse module (the default is enable). +\fB\-\-direct\-io\-mode=BOOL|auto\fR +Specify fuse direct I/O strategy (the default is auto). .TP \fB\-\-dump-fuse=PATH\f\R Dump fuse traffic to PATH @@ -124,9 +133,17 @@ Set entry timeout to SECONDS in fuse kernel module (the default is 1). \fB\-\-gid\-timeout=SECONDS\fR Set auxiliary group list timeout to SECONDS for fuse translator (the default is 0). .TP +\fB\-\-kernel-writeback-cache=BOOL\fR +Enable fuse in-kernel writeback cache. +.TP \fB\-\-negative\-timeout=SECONDS\fR Set negative timeout to SECONDS in fuse kernel module (the default is 0). .TP +\fB\-\-auto\-invalidation=BOOL\fR +controls whether fuse-kernel can auto-invalidate attribute, dentry and +page-cache. Disable this only if same files/directories are not +accessed across two different mounts concurrently [default: on]. +.TP \fB\-\-volfile-check\fR Enable strict volume file checking. diff --git a/doc/glusterfsd.8 b/doc/glusterfsd.8 index 956cb24bca3..bc1de2a8c80 100644 --- a/doc/glusterfsd.8 +++ b/doc/glusterfsd.8 @@ -51,6 +51,9 @@ Server to get the volume from. This option overrides \fB\-\-volfile option .PP .TP +\fB\-\-localtime\-logging\fR +Enable localtime log timestamps. +.TP \fB\-\-debug\fR Run in debug mode. This option sets \fB\-\-no\-daemon\fR, \fB\-\-log\-level\fR to DEBUG and \fB\-\-log\-file\fR to console @@ -104,6 +107,11 @@ Enable/Disable direct-io mode in fuse module [default: enable] .TP \fB\-\-resolve-gids\fR Resolve all auxiliary groups in fuse translator (max 32 otherwise) +.TP +\fB\-\-auto\-invalidation=BOOL\fR +controls whether fuse-kernel can auto-invalidate attribute, dentry and +page-cache. Disable this only if same files/directories are not +accessed across two different mounts concurrently [default: on] .SS "Miscellaneous Options" .PP diff --git a/doc/mount.glusterfs.8 b/doc/mount.glusterfs.8 index 4e82c2fd57d..ce16e9e40b7 100644 --- a/doc/mount.glusterfs.8 +++ b/doc/mount.glusterfs.8 @@ -12,11 +12,11 @@ .SH NAME .B mount.glusterfs - script to mount native GlusterFS volume .SH SYNOPSIS -.B mount -t glusterfs [-o <options>] <volumeserver>:/<volume> +.B mount -t glusterfs [-o <options>] <volumeserver>:/<volume>[/<subdir>] .B <mountpoint> .TP .B mount -t glusterfs [-o <options>] <server1>,<server2>, -.B <server3>,..<serverN>:/<volname> <mount_point> +.B <server3>,..<serverN>:/<volname>[/<subdir>] <mount_point> .TP .TP .B mount -t glusterfs [-o <options>] <path/to/volumefile> <mountpoint> @@ -44,8 +44,8 @@ INFO and NONE [default: INFO] \fBacl Mount the filesystem with POSIX ACL support .TP -\fBfopen\-keep\-cache -Do not purge the cache on file open +\fBfopen\-keep\-cache[=BOOL] +Do not purge the cache on file open (default: false) .TP \fBworm Mount the filesystem in 'worm' mode @@ -65,6 +65,9 @@ Enable internal memory accounting .TP \fBcapability Enable file capability setting and retrival +.TP +\fBthin-client +Enables thin mount and connects via gfproxyd daemon .PP .SS "Advanced options" @@ -89,12 +92,15 @@ Set negative timeout to SECONDS in fuse kernel module [default: 0] Volume name to be used for MOUNT-POINT [default: top most volume in VOLUME-FILE] .TP -\fBdirect\-io\-mode=\fRdisable -Disable direct I/O mode in fuse kernel module +\fBdirect\-io\-mode=\fRBOOL|auto +Specify fuse direct I/O strategy [default: auto] .TP \fBcongestion\-threshold=\fRN Set fuse module's congestion threshold to N [default: 48] .TP +\fsubdir\-mount=\fRN +Set the subdirectory mount option [default: NULL, ie, no subdirectory mount] +.TP .TP \fBbackup\-volfile\-servers=\fRSERVERLIST Provide list of backup volfile servers in the following format [default: None] @@ -116,6 +122,15 @@ Provide list of backup volfile servers in the following format [default: None] \fBDeprecated\fR option - placed here for backward compatibility [default: 1] .TP .TP +\fBlru-limit=\fRN +Set fuse module's limit for number of inodes kept in LRU list to N [default: 65536] +.TP +.TP +\fBinvalidate-limit=\fRN +Suspend fuse invalidations implied by 'lru-limit' if number of outstanding +invalidations reaches N +.TP +.TP \fBbackground-qlen=\fRN Set fuse module's background queue length to N [default: 64] .TP @@ -127,6 +142,20 @@ enable root squashing for the trusted client [default: on] .TP \fBuse\-readdirp=\fRBOOL Use readdirp() mode in fuse kernel module [default: on] +.TP +\fBdump\-fuse=\fRPATH +Dump fuse traffic to PATH +.TP +\fBkernel\-writeback\-cache=\fRBOOL +Enable fuse in-kernel writeback cache [default: off] +.TP +\fBattr\-times\-granularity=\fRNS +Declare supported granularity of file attribute [default: 0] +.TP +\fBauto\-invalidation=\fRBOOL +controls whether fuse-kernel can auto-invalidate attribute, dentry and +page-cache. Disable this only if same files/directories are not +accessed across two different mounts concurrently [default: on] .PP .SH FILES .TP diff --git a/doc/release-notes/3.8.0.md b/doc/release-notes/3.8.0.md deleted file mode 100644 index 6c555aa5f57..00000000000 --- a/doc/release-notes/3.8.0.md +++ /dev/null @@ -1,1523 +0,0 @@ -# Release notes for Gluster 3.8.0 - -This is a major release that includes a huge number of changes. Many -improvements contribute to better support of Gluster with containers and -running your storage on the same server as your hypervisors. Lots of work has -been done to integrate with other projects that are part of the Open Source -storage ecosystem. - -The most notable features and changes are documented on this page. A full list -of bugs that has been addressed is included further below. - -## Major changes and features - -### Changes to building from the release tarball - -By default the release tarballs contain some of the scripts from the GNU -autotools projects. These scripts are used for detecting the environment where -the software is built. This includes operating system, architecture and more. - -Bundling these scripts in the tarball makes it mandatory for some distributions -to replace them with more updated versions. The scripts are included from the -host operating system where the tarball is generated. If this is an older -operating system (like RHEL/CentOS-6), the scripts are not current enough for -some build targets. - -Many distributions have the habit to replace the included `config.guess` and -`config.sub` scripts. The intention of our release tarball is to not include -the script at all, however that breaks some builds. We have now replaced these -scripts with dummy ones, and expect the build environment to replace the -scripts, or run `./configure` with the appropriate `--host=..` and `--build=..` -parameters. - -Building directly from the git repository has not changed. - -### Mandatory lock support for Multiprotocol environment -*Notes for users:* -With this release GlusterFS is now capable of performing file operations based -on core mandatory locking concepts. Apart from Linux kernel style semantics, -GlusterFS volumes can now be configured in a special mode where all traditional -fcntl locks are treated mandatory so as to detect the presence of locks before -every data modifying file operations acting on a particular byte range. This -will help applications to operate on more accurate data during concurrent access -of various byte ranges within a file. Please refer [Administration -Guide](http://gluster.readthedocs.org/en/latest/Administrator%20Guide/Mandatory%20Locks/) -for more details. - -### Gluster/NFS disabled by default -*Notes for users:* -The legacy Gluster NFS server (a.k.a. gNFS) is now disabled by default when new -volumes are created. Users are encouraged to use NFS-Ganesha with FSAL_GLUSTER -instead of gNFS. NFS-Ganesha is a full feature server that is being actively -developed and maintained. It supports NFSv3, NFSv4, and NFSv4.1. The -[documentation](http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Intergration/) -describes how to configure and use NFS-Ganesha. Users that prefer to use the -gNFS server (NFSv3 only) can enable the service per volume with the following -command: - -```bash -# gluster volume set <VOLUME> nfs.disable false -``` - -Existing volumes that have gNFS enabled will remain enabled unless explicitly -disabled. You cannot run both gNFS and NFS-Ganesha servers on the same host. - -The plan is to phase gNFS out of Gluster over the next several releases, -starting with documenting it as officially deprecated, then not compiling and -packaging the components, and ultimately removing the component sources from the -source tree. - -### SEEK -*Notes for users:* -All modern filesystems support SEEK_DATA and SEEK_HOLE with the lseek() -systemcall. This improves performance when reading sparse files. GlusterFS now -supports the SEEK operation as well. Linux kernel 4.5 comes with an improved -FUSE module where lseek() can be used. QEMU can now detect holes in VM images -when using the Gluster-block driver. - -*Limitations:* -The deprecated stripe functionality has not been extended with SEEK. SEEK for -sharding has not been implemented yet, and is expected to follow later in a 3.8 -update ([bug 1301647](https://bugzilla.redhat.com/1301647)). NFS-Ganesha will -support SEEK over NFSv4 in the near future, possibly with the upcoming -NFS-Ganesha 2.4. - -### Compound operations -*Notes for users:* -This feature is being introduced with the intention of improving performance -for file operations on a glusterfs volume. A framework for combining two or -more file operations is provided which will help in reducing the network round -trips made for certain fops. This reduces latency of those fops and provides -better performance than before. - -*Limitations:* -Only very few and specific sets of operations can be compounded at this time. -There is no interface in libgfapi for creating compound operations yet. - -### Geo-replication for Sharded Volumes -*Notes for users:* -With Sharding support, Geo-replication detects small sharded files and syncs to -the slave(s) instead of syncing big files. This enables Geo-rep to sync files -from the master volume to the slave volume(s) more efficiently. - -*Limitations:* -If Sharding is enabled at the master volume then it should be enabled at slave -volume as well. - -### Tiering aware Geo-replication -*Notes for users:* -Tiering moves files between hot/cold tier bricks. Geo-replication syncs files -from bricks in Master volume to Slave volume. With this, Users can configure -geo-replication session in a Tiering based volume. - -*Limitations:* -Configuring geo-replication session in Tiering based volume is same as earlier. -But, before attaching/detaching tier, a few steps needs to be followed: - -Before attaching a tier to a volume with an existing geo-replication session, -the session needs to be stopped. Please find detailed steps in the chapter -called [Attaching a Tier to a Geo-replicated -Volume](https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/chap-Managing_Data_Tiering-Attach_Volumes.html#idp11442496). - -While detaching a tier from a Tiering based volume with existing geo-replication -session, checkpoint of session needs to be done. Please find detailed steps in -the chapter called [Detaching a Tier of a Geo-replicated -Volume](https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/chap-Managing_Data_Tiering-Detach_Tier.html#idp32905264). - -### Enhance Quota enable/disable in glusterd -*Notes for users:* -The enhancement will spawn the crawl process for each brick in the volume and -files will be checked in parallel, which is an independent process for every -brick. This improves the speed of crawling process, thus enhancing the quota -enable/disable process. With this feature, the user need not wait for a long -time once enable or disable quota command is issued. - -### Automagic unsplit-brain by [ctime|mtime|size|majority] for AFR -*Notes for users:* -A new volume option has been introduced called `cluster.favorite-child-policy`. -It can be used to automatically resolve split-brains in replica volumes without -having to use the gluster CLI or the `fuse-mount-setfattr-based` methods to -manually select a source. The healing automatically happens based on various -policies that this option takes. See `gluster volume set help|grep -cluster.favorite-child-policy -A3` for the various policies that you can set. -The default value is 'none' , i.e. this feature is not enabled by default. - -*Limitations:* -`cluster.favorite-child-policy` applies to all files of the volume. It is -assumed that if this option is enabled with a particular policy, you don't care -to examine the split-brain files on a per file basis and use the appropriate -gluster split-brain resolution CLI to resolve them individually with different -policies. - -### glusterfs-coreutils packaged for Fedora and CentOS Storage SIG -*Notes for users:* -These are set of coreutils designed to act on GlusterFS volumes using its native -C API similar to standard Linux coreutils like cp, ls, mv etc. Anyone can easily -make use of these utilities to directly access volumes without mounting the same -via some protocol. Please refer [Admin -guide](http://gluster.readthedocs.org/en/latest/Administrator%20Guide/GlusterFS%20Coreutils/) -for more details. - -### WORM, Retention and Compliance -*Notes for users:* -This feature is about having WORM based compliance/archiving solution in -GlusterFS. This adds the file-level WORM/Retention feature to the existing -implementation of the WORM translator which works at volume level. Users can -switch between either volume-level WORM or file-level WORM/Retention features. -This feature will work only if the "read-only" and "worm" options on the volume -are "off" and the "worm-file-level option" is "on" on the volume. A file can be -in any of these three states: - -1. Normal: Where we can perform normal operations on the files -2. WORM-Retained: Where file will be immutable and undeletable -3. WORM: Where file will be immutable but deletable - -Added four volume set options: - -1. `features.worm-file-level`: It enables the file level WORM/Retention feature. - It is "off" by default -2. `features.retention-mode`: Takes two values - 1. `relax`: allows users to increase or decrease the retention period of a - WORM/Retained file (Can not be decreased below the modification time of the - file) - 2. `enterprise`: allows users only to increase the retention period of a - WORM/Retained file -3. `features.auto-commit-period`: time period at/after which the auto commit - feature should look for the dormant files for state transition. Default value - is 180 seconds -4. `features.default-retention-period`: time period till which a file should be - undeletable. This value is also used to find the dormant files, i.e., files - which are not modified for this much time, will qualify for state transition. - Default value is 120 seconds - -User can do the manual transition by using the `chmod -w <filename>` or -equivalent command or the lazy auto-commit will take place when I/O triggered -using timeouts for untouched files. The next I/O(link, unlink, rename, truncate) -will cause the transition - -*Limitations:* -1. No Data validation of Read-only data i.e Integration with bitrot not done -2. Internal operations like tiering, rebalancing, self-healing will fail on - WORMed files -3. No control on ctime - -### Lock migration -*Notes for users:* -In the current release, the lock state of a file is lost when the file moves to -another brick as part of rebalance. With the new lock migration feature, the -locks associated with a file will be migrated during a rebalance operation. - -Users can enable this feature by the following command: - -```bash -gluster volume set <vol-name> lock-migration on -``` - -*Limitations:* -The current implementation is experimental. Hence it is not recommended for a -production environment. This feature is planned to be stabilized in future -releases. Feedback from the community is welcome and greatly appreciated. - -### Granular Entry self-heal for AFR -*Notes for users:* -This feature can be enabled using the command - -```bash -gluster volume set <vol-name> granular-entry-heal on -``` - -*Limitations:* -1. The feature is not backward compatible. So please enable the option only - after you have upgraded all your clients and servers to 3.8 and op-version - is 30800 -2. Make sure the volume is stopped and there is no pending heal before you - enable the feature. - -### Gdeploy packaged for Fedora and EPEL -*Notes for users:* -With gdeploy, deployment and configuration is a lot easier, it abstracts the -complexities of learning and writing YAML files. And reusing the gdeploy -configuration files with slight modification is lot easier than editing the -YAML files, and debugging the errors. -Setting up a GlusterFS volume involves quite a bit of tasks like: - -1. Setting up PV, VG, LV (thinpools if necessary). -2. Peer probing the nodes. -3. CLI to create volume (which can get lengthy and error prone as the number of nodes increase). - -gdeploy helps in simplifying the above tasks and adds many more useful features -like installing packages, handling volumes remotely, setting volume options -while creating the volume so on... - -*Limitations:* -We cannot have periodic status checks or similar health monitoring of the -Gluster setup using gdeploy. - -So it does not keep track of the previous deployments you have made. You need -to give every detail that gdeploy would require at each stage of deployment as -it does not keep any state. - -### Glusterfind and Bareos Integration -*Notes for users:* -This is a first integration of Gluster with a Backup & Recovery Application. -The integration enabler is a [Bareos](http://bareos.org/) plugin for Glusterfs -and a Gluster python utility called glusterfind. The integration provides -facility to backup and restore from and to Glusterfs volumes via the libgfapi -library which interacts directly with the Glusterfs server and not via a -Glusterfs mount point. - -During the backup operation, the glusterfind utility helps to speed up full -file listing by parallelly running on bricks' back-end instead of using the -more expensive READDIR file operation needed when listing at a mount point. For -incremental changes, the glusterfind utility picks up changed files from file -system changelogs instead of crawling the entire file system scavenging for the -files' modification time. - -*Limitations:* -Since bareos interfaces with Glusterfs via the libgfapi library and needs to -execute the glusterfind tool, bareos needs to be running on one of the Gluster -cluster nodes to make the most of it. - -### Heketi -*Notes for users:* -[Heketi](https://github.com/heketi/heketi/wiki) provides a RESTful management -interface which can be used to manage the life cycle of GlusterFS volumes. With -Heketi, cloud services like OpenStack Manila, Kubernetes, and OpenShift can -dynamically provision GlusterFS volumes with any of the supported durability -types - -*Limitations:* -Currently, Heketi only provides volume create, delete, and expand commands. - -## Bugs addressed - -A total of 1782 patches has been sent, addressing 1228 bugs: - -- [#789278](https://bugzilla.redhat.com/789278): Issues reported by Coverity static analysis tool -- [#1004332](https://bugzilla.redhat.com/1004332): Setting of any option using volume set fails when the clients are in older version. -- [#1054694](https://bugzilla.redhat.com/1054694): A replicated volume takes too much to come online when one server is down -- [#1075611](https://bugzilla.redhat.com/1075611): [FEAT] log: enhance gluster log format with message ID and standardize errno reporting -- [#1092414](https://bugzilla.redhat.com/1092414): Disable NFS by default -- [#1093692](https://bugzilla.redhat.com/1093692): Resource/Memory leak issues reported by Coverity. -- [#1094119](https://bugzilla.redhat.com/1094119): Remove replace-brick with data migration support from gluster cli -- [#1109180](https://bugzilla.redhat.com/1109180): Issues reported by Cppcheck static analysis tool -- [#1110262](https://bugzilla.redhat.com/1110262): suid,sgid,sticky bit on directories not preserved when doing add-brick -- [#1114847](https://bugzilla.redhat.com/1114847): glusterd logs are filled with "readv on /var/run/a30ad20ae7386a2fe58445b1a2b1359c.socket failed (Invalid argument)" -- [#1117886](https://bugzilla.redhat.com/1117886): Gluster not resolving hosts with IPv6 only lookups -- [#1122377](https://bugzilla.redhat.com/1122377): [SNAPSHOT]: activate and deactivate doesn't do a handshake when a glusterd comes back -- [#1122395](https://bugzilla.redhat.com/1122395): man or info page of gluster needs to be updated with self-heal commands. -- [#1129939](https://bugzilla.redhat.com/1129939): NetBSD port -- [#1131275](https://bugzilla.redhat.com/1131275): I currently have no idea what rfc.sh is doing during at any specific moment -- [#1132465](https://bugzilla.redhat.com/1132465): [FEAT] Trash translator -- [#1141379](https://bugzilla.redhat.com/1141379): Geo-Replication - Fails to handle file renaming correctly between master and slave -- [#1142423](https://bugzilla.redhat.com/1142423): [DHT-REBALANCE]-DataLoss: The data appended to a file during its migration will be lost once the migration is done -- [#1143880](https://bugzilla.redhat.com/1143880): [FEAT] Exports and Netgroups Authentication for Gluster NFS mount -- [#1158654](https://bugzilla.redhat.com/1158654): [FEAT] Journal Based Replication (JBR - formerly NSR) -- [#1162905](https://bugzilla.redhat.com/1162905): hardcoded gsyncd path causes geo-replication to fail on non-redhat systems -- [#1163416](https://bugzilla.redhat.com/1163416): [USS]: From NFS, unable to go to .snaps directory (error: No such file or directory) -- [#1163543](https://bugzilla.redhat.com/1163543): Fix regression test spurious failures -- [#1165041](https://bugzilla.redhat.com/1165041): Different client can not execute "for((i=0;i<1000;i++));do ls -al;done" in a same directory at the sametime -- [#1166862](https://bugzilla.redhat.com/1166862): rmtab file is a bottleneck when lot of clients are accessing a volume through NFS -- [#1168819](https://bugzilla.redhat.com/1168819): [USS]: Need defined rules for snapshot-directory, setting to a/b works but in linux a/b is b is subdirectory of a -- [#1169317](https://bugzilla.redhat.com/1169317): rmtab file is a bottleneck when lot of clients are accessing a volume through NFS -- [#1170075](https://bugzilla.redhat.com/1170075): [RFE] : BitRot detection in glusterfs -- [#1171703](https://bugzilla.redhat.com/1171703): AFR+SNAPSHOT: File with hard link have different inode number in USS -- [#1171954](https://bugzilla.redhat.com/1171954): [RFE] Rebalance Performance Improvements -- [#1174765](https://bugzilla.redhat.com/1174765): Hook scripts are not installed after make install -- [#1176062](https://bugzilla.redhat.com/1176062): Force replace-brick lead to the persistent write(use dd) return Input/output error -- [#1176837](https://bugzilla.redhat.com/1176837): [USS] : statfs call fails on USS. -- [#1178619](https://bugzilla.redhat.com/1178619): Statfs is hung because of frame loss in quota -- [#1180545](https://bugzilla.redhat.com/1180545): Incomplete conservative merge for split-brained directories -- [#1188145](https://bugzilla.redhat.com/1188145): Disperse volume: I/O error on client when USS is turned on -- [#1188242](https://bugzilla.redhat.com/1188242): Disperse volume: client crashed while running iozone -- [#1189363](https://bugzilla.redhat.com/1189363): ignore_deletes option is not something you can configure -- [#1189473](https://bugzilla.redhat.com/1189473): [RFE] While creating a snapshot the timestamp has to be appended to the snapshot name. -- [#1193388](https://bugzilla.redhat.com/1193388): Disperse volume: Failed to update version and size (error 2) seen during delete operations -- [#1193636](https://bugzilla.redhat.com/1193636): [DHT:REBALANCE]: xattrs set on the file during rebalance migration will be lost after migration is over -- [#1194640](https://bugzilla.redhat.com/1194640): Tracker bug for Logging framework expansion. -- [#1194753](https://bugzilla.redhat.com/1194753): Storage tier feature -- [#1195947](https://bugzilla.redhat.com/1195947): Reduce the contents of dependencies from glusterfs-api -- [#1196027](https://bugzilla.redhat.com/1196027): Fix memory leak while using scandir -- [#1198849](https://bugzilla.redhat.com/1198849): Minor improvements and cleanup for the build system -- [#1199894](https://bugzilla.redhat.com/1199894): RFE: Clone of a snapshot -- [#1199985](https://bugzilla.redhat.com/1199985): [RFE] arbiter for 3 way replication -- [#1200082](https://bugzilla.redhat.com/1200082): [FEAT] - Sharding xlator -- [#1200254](https://bugzilla.redhat.com/1200254): NFS-Ganesha : Locking of global option file used by NFS-Ganesha. -- [#1200262](https://bugzilla.redhat.com/1200262): Upcall framework support along with cache_invalidation usecase handled -- [#1200265](https://bugzilla.redhat.com/1200265): NFS-Ganesha: Handling GlusterFS CLI commands when NFS-Ganesha related commands are executed and other additonal checks -- [#1200267](https://bugzilla.redhat.com/1200267): Upcall: Cleanup the expired upcall entries -- [#1200271](https://bugzilla.redhat.com/1200271): Upcall: xlator options for Upcall xlator -- [#1200364](https://bugzilla.redhat.com/1200364): longevity: Incorrect log level messages in posix_istat and posix_lookup -- [#1200704](https://bugzilla.redhat.com/1200704): rdma: properly handle memory registration during network interruption -- [#1201284](https://bugzilla.redhat.com/1201284): tools/glusterfind: Use Changelogs more effectively for GFID to Path conversion -- [#1201289](https://bugzilla.redhat.com/1201289): tools/glusterfind: Support Partial Find feature -- [#1202244](https://bugzilla.redhat.com/1202244): [Quota] : To have a separate quota.conf file for inode quota. -- [#1202274](https://bugzilla.redhat.com/1202274): Minor improvements and code cleanup for libgfapi -- [#1202649](https://bugzilla.redhat.com/1202649): [georep]: Transition from xsync to changelog doesn't happen once the brick is brought online -- [#1202758](https://bugzilla.redhat.com/1202758): Disperse volume: brick logs are getting filled with "anonymous fd creation failed" messages -- [#1203089](https://bugzilla.redhat.com/1203089): Disperse volume: misleading unsuccessful message with heal and heal full -- [#1203185](https://bugzilla.redhat.com/1203185): Detached node list stale snaps -- [#1204641](https://bugzilla.redhat.com/1204641): [geo-rep] stop-all-gluster-processes.sh fails to stop all gluster processes -- [#1204651](https://bugzilla.redhat.com/1204651): libgfapi : Anonymous fd support in gfapi -- [#1205037](https://bugzilla.redhat.com/1205037): [SNAPSHOT]: "man gluster" needs modification for few snapshot commands -- [#1205186](https://bugzilla.redhat.com/1205186): RCU changes wrt peers to be done for GlusterFS-3.7.0 -- [#1205540](https://bugzilla.redhat.com/1205540): Data Tiering:3.7.0:data loss:detach-tier not flushing data to cold-tier -- [#1205545](https://bugzilla.redhat.com/1205545): Effect of Trash translator over CTR translator -- [#1205596](https://bugzilla.redhat.com/1205596): [SNAPSHOT]: Output message when a snapshot create is issued when multiple bricks are down needs to be improved -- [#1205624](https://bugzilla.redhat.com/1205624): Data Tiering:rebalance fails on a tiered volume -- [#1206461](https://bugzilla.redhat.com/1206461): sparse file self heal fail under xfs version 2 with speculative preallocation feature on -- [#1206539](https://bugzilla.redhat.com/1206539): Tracker bug for GlusterFS documentation Improvement. -- [#1206546](https://bugzilla.redhat.com/1206546): [RFE] Data Tiering:Need a way from CLI to identify hot and cold tier bricks easily -- [#1206587](https://bugzilla.redhat.com/1206587): Replace contrib/uuid by a libglusterfs wrapper that uses the uuid implementation from the OS -- [#1207020](https://bugzilla.redhat.com/1207020): BitRot :- CPU/disk throttling during signature calculation -- [#1207028](https://bugzilla.redhat.com/1207028): [Backup]: User must be warned while running the 'glusterfind pre' command twice without running the post command -- [#1207029](https://bugzilla.redhat.com/1207029): BitRot :- If peer in cluster doesn't have brick then its should not start bitd on that node and should not create partial volume file -- [#1207115](https://bugzilla.redhat.com/1207115): geo-rep: add debug logs to master for slave ENTRY operation failures -- [#1207134](https://bugzilla.redhat.com/1207134): BitRot :- bitd is not signing Objects if more than 3 bricks are present on same node -- [#1207532](https://bugzilla.redhat.com/1207532): BitRot :- gluster volume help gives insufficient and ambiguous information for bitrot -- [#1207603](https://bugzilla.redhat.com/1207603): Persist file size and block count of sharded files in the form of xattrs -- [#1207615](https://bugzilla.redhat.com/1207615): sharding - Implement remaining fops -- [#1207627](https://bugzilla.redhat.com/1207627): BitRot :- Data scrubbing status is not available -- [#1207712](https://bugzilla.redhat.com/1207712): Input/Output error with disperse volume when geo-replication is started -- [#1207735](https://bugzilla.redhat.com/1207735): Disperse volume: Huge memory leak of glusterfsd process -- [#1207829](https://bugzilla.redhat.com/1207829): Incomplete self-heal and split-brain on directories found when self-healing files/dirs on a replaced disk -- [#1207979](https://bugzilla.redhat.com/1207979): BitRot :- In case of NFS mount, Object Versioning and file signing is not working as expected -- [#1208131](https://bugzilla.redhat.com/1208131): BitRot :- Tunable (scrub-throttle, scrub-frequency, pause/resume) for scrub functionality don't have any impact on scrubber -- [#1208470](https://bugzilla.redhat.com/1208470): [Dist-geo-rep] after snapshot in geo-rep setup, empty changelogs are generated in the snapped brick. -- [#1208482](https://bugzilla.redhat.com/1208482): pthread cond and mutex variables of fs struct has to be destroyed conditionally. -- [#1209104](https://bugzilla.redhat.com/1209104): Do not let an inode evict during split-brain resolution process. -- [#1209138](https://bugzilla.redhat.com/1209138): [Backup]: Packages to be installed for glusterfind api to work -- [#1209298](https://bugzilla.redhat.com/1209298): NFS interoperability problem: Gluster Striped-Replicated can't read on vmware esxi 5.x NFS client -- [#1209329](https://bugzilla.redhat.com/1209329): glusterd services are not handled properly when re configuring services -- [#1209430](https://bugzilla.redhat.com/1209430): quota/marker: turn off inode quotas by default -- [#1209461](https://bugzilla.redhat.com/1209461): BVT: glusterd crashed and dumped during upgrade (on rhel7.1 server) -- [#1209735](https://bugzilla.redhat.com/1209735): FSAL_GLUSTER : symlinks are not working properly if acl is enabled -- [#1209752](https://bugzilla.redhat.com/1209752): BitRot :- info about bitd and scrubber daemon is not shown in volume status -- [#1209818](https://bugzilla.redhat.com/1209818): BitRot :- volume info should not show 'features.scrub: resume' if scrub process is resumed -- [#1209843](https://bugzilla.redhat.com/1209843): [Backup]: Crash observed when multiple sessions were created for the same volume -- [#1209869](https://bugzilla.redhat.com/1209869): xdata in FOPs should always be valid and never junk -- [#1210344](https://bugzilla.redhat.com/1210344): Have a fixed name for common meta-volume for nfs, snapshot and geo-rep and mount it at a fixed mount location -- [#1210562](https://bugzilla.redhat.com/1210562): Dist-geo-rep: Too many "remote operation failed: No such file or directory" warning messages in auxilary mount log on slave while executing "rm -rf" -- [#1210684](https://bugzilla.redhat.com/1210684): BitRot :- scrub pause/resume should give proper error message if scrubber is already paused/resumed and Admin tries to perform same operation -- [#1210687](https://bugzilla.redhat.com/1210687): BitRot :- If scrubber finds bad file then it should log as a 'ALERT' in log not 'Warning' -- [#1210689](https://bugzilla.redhat.com/1210689): BitRot :- Files marked as 'Bad' should not be accessible from mount -- [#1210934](https://bugzilla.redhat.com/1210934): qcow2 image creation using qemu-img hits segmentation fault -- [#1210965](https://bugzilla.redhat.com/1210965): Geo-replication very slow, not able to sync all the files to slave -- [#1211037](https://bugzilla.redhat.com/1211037): [dist-geo-rep]:Directory not empty and Stale file handle errors in geo-rep logs during deletes from master in history/changelog crawl -- [#1211123](https://bugzilla.redhat.com/1211123): ls command failed with features.read-only on while mounting ec volume. -- [#1211132](https://bugzilla.redhat.com/1211132): 'volume get' invoked on a non-existing key fails with zero as a return value -- [#1211220](https://bugzilla.redhat.com/1211220): quota: ENOTCONN parodically seen in logs when setting hard/soft timeout during I/O. -- [#1211221](https://bugzilla.redhat.com/1211221): Any operation that relies on fd->flags may not work on anonymous fds -- [#1211264](https://bugzilla.redhat.com/1211264): Data Tiering: glusterd(management) communication issues seen on tiering setup -- [#1211327](https://bugzilla.redhat.com/1211327): Changelog: Changelog should be treated as discontinuous only on changelog enable/disable -- [#1211562](https://bugzilla.redhat.com/1211562): Data Tiering:UI:changes required to CLI responses for attach and detach tier -- [#1211570](https://bugzilla.redhat.com/1211570): Data Tiering:UI:when a user looks for detach-tier help, instead command seems to be getting executed -- [#1211576](https://bugzilla.redhat.com/1211576): Gluster CLI crashes when volume create command is incomplete -- [#1211594](https://bugzilla.redhat.com/1211594): status.brick memory allocation failure. -- [#1211640](https://bugzilla.redhat.com/1211640): glusterd crash when snapshot create was in progress on different volumes at the same time - job edited to create snapshots at the given time -- [#1211749](https://bugzilla.redhat.com/1211749): glusterd crashes when brick option validation fails -- [#1211808](https://bugzilla.redhat.com/1211808): quota: inode quota not healing after upgrade -- [#1211836](https://bugzilla.redhat.com/1211836): glusterfs-api.pc versioning breaks QEMU -- [#1211848](https://bugzilla.redhat.com/1211848): Gluster namespace and module should be part of glusterfs-libs rpm -- [#1211900](https://bugzilla.redhat.com/1211900): package glupy as a subpackage under gluster namespace. -- [#1211913](https://bugzilla.redhat.com/1211913): nfs : racy condition in export/netgroup feature -- [#1211962](https://bugzilla.redhat.com/1211962): Disperse volume: Input/output errors on nfs and fuse mounts during delete operation -- [#1212037](https://bugzilla.redhat.com/1212037): Data Tiering:Old copy of file still remaining on EC(disperse) layer, when edited after attaching tier(new copy is moved to hot tier) -- [#1212063](https://bugzilla.redhat.com/1212063): [Geo-replication] cli crashed and core dump was observed while running gluster volume geo-replication vol0 status command -- [#1212110](https://bugzilla.redhat.com/1212110): bricks process crash -- [#1212253](https://bugzilla.redhat.com/1212253): cli should return error with inode quota cmds on cluster with op_version less than 3.7 -- [#1212385](https://bugzilla.redhat.com/1212385): Disable rpc throttling for glusterfs protocol -- [#1212398](https://bugzilla.redhat.com/1212398): [New] - Distribute replicate volume type is shown as Distribute Stripe in the output of gluster volume info <volname> --xml -- [#1212400](https://bugzilla.redhat.com/1212400): Attach tier failing and messing up vol info -- [#1212410](https://bugzilla.redhat.com/1212410): dist-geo-rep : all the bricks of a node shows faulty in status if slave node to which atleast one of the brick connected goes down. -- [#1212413](https://bugzilla.redhat.com/1212413): [RFE] Return proper error codes in case of snapshot failure -- [#1212437](https://bugzilla.redhat.com/1212437): probing and detaching a peer generated a CRITICAL error - "Could not find peer" in glusterd logs -- [#1212660](https://bugzilla.redhat.com/1212660): Crashes in logging code -- [#1212816](https://bugzilla.redhat.com/1212816): NFS-Ganesha : Add-node and delete-node should start/stop NFS-Ganesha service -- [#1213063](https://bugzilla.redhat.com/1213063): The tiering feature requires counters. -- [#1213066](https://bugzilla.redhat.com/1213066): Failure in tests/performance/open-behind.t -- [#1213125](https://bugzilla.redhat.com/1213125): Bricks fail to start with tiering related logs on the brick -- [#1213295](https://bugzilla.redhat.com/1213295): Glusterd crashed after updating to 3.8 nightly build -- [#1213349](https://bugzilla.redhat.com/1213349): [Snapshot] Scheduler should check vol-name exists or not before adding scheduled jobs -- [#1213358](https://bugzilla.redhat.com/1213358): Implement directory heal for ec -- [#1213364](https://bugzilla.redhat.com/1213364): [RFE] Quota: Make "quota-deem-statfs" option ON, by default, when quota is enabled. -- [#1213542](https://bugzilla.redhat.com/1213542): Symlink heal leaks 'linkname' memory -- [#1213752](https://bugzilla.redhat.com/1213752): nfs-ganesha: Multi-head nfs need Upcall Cache invalidation support -- [#1213773](https://bugzilla.redhat.com/1213773): upcall: polling is done for a invalid file -- [#1213933](https://bugzilla.redhat.com/1213933): common-ha: delete-node implementation -- [#1214048](https://bugzilla.redhat.com/1214048): IO touched a file undergoing migration fails for tiered volumes -- [#1214219](https://bugzilla.redhat.com/1214219): Data Tiering:Enabling quota command fails with "quota command failed : Commit failed on localhost" -- [#1214222](https://bugzilla.redhat.com/1214222): Directories are missing on the mount point after attaching tier to distribute replicate volume. -- [#1214289](https://bugzilla.redhat.com/1214289): I/O failure on attaching tier -- [#1214561](https://bugzilla.redhat.com/1214561): [Backup]: To capture path for deletes in changelog file -- [#1214574](https://bugzilla.redhat.com/1214574): Snapshot-scheduling helper script errors out while running "snap_scheduler.py init" -- [#1215002](https://bugzilla.redhat.com/1215002): glusterd crashed on the node when tried to detach a tier after restoring data from the snapshot. -- [#1215018](https://bugzilla.redhat.com/1215018): [New] - gluster peer status goes to disconnected state. -- [#1215117](https://bugzilla.redhat.com/1215117): Disperse volume: rebalance and quotad crashed -- [#1215122](https://bugzilla.redhat.com/1215122): Data Tiering: attaching a tier with non supported replica count crashes glusterd on local host -- [#1215161](https://bugzilla.redhat.com/1215161): rpc: Memory corruption because rpcsvc_register_notify interprets opaque mydata argument as xlator pointer -- [#1215187](https://bugzilla.redhat.com/1215187): timeout/expiry of group-cache should be set to 300 seconds -- [#1215265](https://bugzilla.redhat.com/1215265): Fixes for data self-heal in ec -- [#1215486](https://bugzilla.redhat.com/1215486): configure: automake defaults to Unix V7 tar, w/ max filename length=99 chars -- [#1215550](https://bugzilla.redhat.com/1215550): glusterfsd crashed after directory was removed from the mount point, while self-heal and rebalance were running on the volume -- [#1215571](https://bugzilla.redhat.com/1215571): Data Tiering: add tiering set options to volume set help (cluster.tier-demote-frequency and cluster.tier-promote-frequency) -- [#1215592](https://bugzilla.redhat.com/1215592): Crash in dht_getxattr_cbk -- [#1215660](https://bugzilla.redhat.com/1215660): tiering: cksum mismach for tiered volume. -- [#1215896](https://bugzilla.redhat.com/1215896): Typos in the messages logged by the CTR translator -- [#1216067](https://bugzilla.redhat.com/1216067): Autogenerated files delivered in tarball -- [#1216187](https://bugzilla.redhat.com/1216187): readdir-ahead needs to be enabled by default for new volumes on gluster-3.7 -- [#1216898](https://bugzilla.redhat.com/1216898): Data Tiering: Volume inconsistency errors getting logged when attaching uneven(odd) number of hot bricks in hot tier(pure distribute tier layer) to a dist-rep volume -- [#1216931](https://bugzilla.redhat.com/1216931): [Snapshot] Snapshot scheduler show status disable even when it is enabled -- [#1216960](https://bugzilla.redhat.com/1216960): data tiering: do not allow tiering related volume set options on a regular volume -- [#1217311](https://bugzilla.redhat.com/1217311): Disperse volume: gluster volume status doesn't show shd status -- [#1217701](https://bugzilla.redhat.com/1217701): ec test spurious failures -- [#1217766](https://bugzilla.redhat.com/1217766): Spurious failures in tests/bugs/distribute/bug-1122443.t -- [#1217786](https://bugzilla.redhat.com/1217786): Data Tiering : Adding performance to unlink/link/rename in CTR Xlator -- [#1217788](https://bugzilla.redhat.com/1217788): spurious failure bug-908146.t -- [#1217937](https://bugzilla.redhat.com/1217937): DHT/Tiering/Rebalancer: The Client PID set by tiering migration is getting reset by dht migration -- [#1217949](https://bugzilla.redhat.com/1217949): Null check before freeing dir_dfmeta and tmp_container -- [#1218055](https://bugzilla.redhat.com/1218055): "Snap_scheduler disable" should have different return codes for different failures. -- [#1218060](https://bugzilla.redhat.com/1218060): [SNAPSHOT]: Initializing snap_scheduler from all nodes at the same time should give proper error message -- [#1218120](https://bugzilla.redhat.com/1218120): Regression failures in tests/bugs/snapshot/bug-1162498.t -- [#1218164](https://bugzilla.redhat.com/1218164): [SNAPSHOT] : Correction required in output message after initilalising snap_scheduler -- [#1218287](https://bugzilla.redhat.com/1218287): Use tiering only if all nodes are capable of it at proper version -- [#1218304](https://bugzilla.redhat.com/1218304): Intermittent failure of basic/afr/data-self-heal.t -- [#1218552](https://bugzilla.redhat.com/1218552): Rsync Hang and Georep fails to Sync files -- [#1218573](https://bugzilla.redhat.com/1218573): [Snapshot] Scheduled job is not processed when one of the node of shared storage volume is down -- [#1218625](https://bugzilla.redhat.com/1218625): glfs.h:46:21: fatal error: sys/acl.h: No such file or directory -- [#1218638](https://bugzilla.redhat.com/1218638): tiering documentation -- [#1218717](https://bugzilla.redhat.com/1218717): Files migrated should stay on a tier for a full cycle -- [#1218854](https://bugzilla.redhat.com/1218854): Clean up should not empty the contents of the global config file -- [#1218951](https://bugzilla.redhat.com/1218951): Spurious failures in fop-sanity.t -- [#1218960](https://bugzilla.redhat.com/1218960): Rebalance Status output lists an extra colon " : " after volume rebalance: <vol_name>: success: -- [#1219032](https://bugzilla.redhat.com/1219032): cli: While attaching tier cli sholud always ask question whether you really want to attach a tier or not. -- [#1219355](https://bugzilla.redhat.com/1219355): glusterd:Scrub and bitd reconfigure functions were not calling if quota is not enabled. -- [#1219442](https://bugzilla.redhat.com/1219442): [Snapshot] Do not run scheduler if ovirt scheduler is running -- [#1219479](https://bugzilla.redhat.com/1219479): [Dist-geo-rep] after snapshot in geo-rep setup, empty changelogs are generated in the snapped brick. -- [#1219485](https://bugzilla.redhat.com/1219485): nfs-ganesha: Discrepancies with lock states recovery during migration -- [#1219637](https://bugzilla.redhat.com/1219637): Gluster small-file creates do not scale with brick count -- [#1219732](https://bugzilla.redhat.com/1219732): brick-op failure for glusterd command should log error message in cmd_history.log -- [#1219738](https://bugzilla.redhat.com/1219738): Regression failures in tests/bugs/snapshot/bug-1112559.t -- [#1219784](https://bugzilla.redhat.com/1219784): bitrot: glusterd is crashing when user enable bitrot on the volume -- [#1219816](https://bugzilla.redhat.com/1219816): Spurious failure in tests/bugs/replicate/bug-976800.t -- [#1219846](https://bugzilla.redhat.com/1219846): Data Tiering: glusterd(management) communication issues seen on tiering setup -- [#1219894](https://bugzilla.redhat.com/1219894): [georep]: Creating geo-rep session kills all the brick process -- [#1219937](https://bugzilla.redhat.com/1219937): Running status second time shows no active sessions -- [#1219954](https://bugzilla.redhat.com/1219954): The python-gluster RPM should be 'noarch' -- [#1220016](https://bugzilla.redhat.com/1220016): bitrot testcases fail spuriously -- [#1220058](https://bugzilla.redhat.com/1220058): Disable known bad tests -- [#1220173](https://bugzilla.redhat.com/1220173): SEEK_HOLE support (optimization) -- [#1220329](https://bugzilla.redhat.com/1220329): DHT Rebalance : Misleading log messages for linkfiles -- [#1220332](https://bugzilla.redhat.com/1220332): dHT rebalance: Dict_copy log messages when running rebalance on a dist-rep volume -- [#1220348](https://bugzilla.redhat.com/1220348): Client hung up on listing the files on a perticular directory -- [#1220381](https://bugzilla.redhat.com/1220381): unable to start the volume with the latest beta1 rpms -- [#1220670](https://bugzilla.redhat.com/1220670): snap_scheduler script must be usable as python module. -- [#1220713](https://bugzilla.redhat.com/1220713): Scrubber should be disabled once bitrot is reset -- [#1221008](https://bugzilla.redhat.com/1221008): libgfapi: Segfault seen when glfs_*() methods are invoked with invalid glfd -- [#1221025](https://bugzilla.redhat.com/1221025): Glusterd crashes after enabling quota limit on a distrep volume. -- [#1221095](https://bugzilla.redhat.com/1221095): Fix nfs/mount3.c build warnings reported in Koji -- [#1221104](https://bugzilla.redhat.com/1221104): Sharding - Skip update of block count and size for directories in readdirp callback -- [#1221128](https://bugzilla.redhat.com/1221128): `gluster volume heal <vol-name> split-brain' tries to heal even with insufficient arguments -- [#1221131](https://bugzilla.redhat.com/1221131): NFS-Ganesha: ACL should not be enabled by default -- [#1221145](https://bugzilla.redhat.com/1221145): ctdb's ping_pong lock tester fails with input/output error on disperse volume mounted with glusterfs -- [#1221270](https://bugzilla.redhat.com/1221270): Do not allow detach-tier commands on a non tiered volume -- [#1221481](https://bugzilla.redhat.com/1221481): `ls' on a directory which has files with mismatching gfid's does not list anything -- [#1221490](https://bugzilla.redhat.com/1221490): fuse: check return value of setuid -- [#1221544](https://bugzilla.redhat.com/1221544): [Backup]: Unable to create a glusterfind session -- [#1221577](https://bugzilla.redhat.com/1221577): glusterfsd crashed on a quota enabled volume where snapshots were scheduled -- [#1221696](https://bugzilla.redhat.com/1221696): rebalance failing on one of the node -- [#1221737](https://bugzilla.redhat.com/1221737): Multi-threaded SHD support -- [#1221889](https://bugzilla.redhat.com/1221889): Log EEXIST errors in DEBUG level in fops MKNOD and MKDIR -- [#1221914](https://bugzilla.redhat.com/1221914): Implement MKNOD fop in bit-rot. -- [#1221938](https://bugzilla.redhat.com/1221938): SIGNING FAILURE Error messages are poping up in the bitd log -- [#1221970](https://bugzilla.redhat.com/1221970): tiering: use sperate log/socket/pid file for tiering -- [#1222013](https://bugzilla.redhat.com/1222013): Simplify creation and set-up of meta-volume (shared storage) -- [#1222088](https://bugzilla.redhat.com/1222088): Data Tiering:3.7.0:data loss:detach-tier not flushing data to cold-tier -- [#1222092](https://bugzilla.redhat.com/1222092): rebalance failed after attaching the tier to the volume. -- [#1222126](https://bugzilla.redhat.com/1222126): DHT: lookup-unhashed feature breaks runtime compatibility with older client versions -- [#1222238](https://bugzilla.redhat.com/1222238): features/changelog: buffer overrun in changelog-helpers -- [#1222317](https://bugzilla.redhat.com/1222317): Building packages on RHEL-5 based distributions fail -- [#1222319](https://bugzilla.redhat.com/1222319): Remove all occurrences of #include "config.h" -- [#1222378](https://bugzilla.redhat.com/1222378): GlusterD fills the logs when the NFS-server is disabled -- [#1222379](https://bugzilla.redhat.com/1222379): Fix infinite looping in shard_readdir(p) on '/' -- [#1222769](https://bugzilla.redhat.com/1222769): libglusterfs: fix uninitialized argument value -- [#1222840](https://bugzilla.redhat.com/1222840): I/O's hanging on tiered volumes (NFS) -- [#1222898](https://bugzilla.redhat.com/1222898): geo-replication: fix memory leak in gsyncd -- [#1223185](https://bugzilla.redhat.com/1223185): [SELinux] [BVT]: Selinux throws AVC errors while running DHT automation on Rhel6.6 -- [#1223213](https://bugzilla.redhat.com/1223213): gluster volume status fails with locking failed error message -- [#1223280](https://bugzilla.redhat.com/1223280): [geo-rep]: worker died with "ESTALE" when performed rm -rf on a directory from mount of master volume -- [#1223338](https://bugzilla.redhat.com/1223338): glusterd could crash in remove-brick-status when local remove-brick process has just completed -- [#1223378](https://bugzilla.redhat.com/1223378): gfid-access: Remove dead increment (dead store) -- [#1223385](https://bugzilla.redhat.com/1223385): packaging: .pc files included in -api-devel should be in -devel -- [#1223432](https://bugzilla.redhat.com/1223432): Update gluster op version to 30701 -- [#1223625](https://bugzilla.redhat.com/1223625): rebalance : output of rebalance status should show ' run time ' in proper format (day,hour:min:sec) -- [#1223642](https://bugzilla.redhat.com/1223642): [geo-rep]: With tarssh the file is created at slave but it doesnt get sync -- [#1223739](https://bugzilla.redhat.com/1223739): Quota: Do not allow set/unset of quota limit in heterogeneous cluster -- [#1223741](https://bugzilla.redhat.com/1223741): non-root geo-replication session goes to faulty state, when the session is started -- [#1223759](https://bugzilla.redhat.com/1223759): Sharding - Fix posix compliance test failures. -- [#1223772](https://bugzilla.redhat.com/1223772): Though brick demon is not running, gluster vol status command shows the pid -- [#1223798](https://bugzilla.redhat.com/1223798): Quota: spurious failures with quota testcases -- [#1223889](https://bugzilla.redhat.com/1223889): readdirp return 64bits inodes even if enable-ino32 is set -- [#1223937](https://bugzilla.redhat.com/1223937): Outdated autotools helper config.* files -- [#1224016](https://bugzilla.redhat.com/1224016): NFS: IOZone tests hang, disconnects and hung tasks seen in logs. -- [#1224098](https://bugzilla.redhat.com/1224098): [geo-rep]: Even after successful sync, the DATA counter did not reset to 0 -- [#1224290](https://bugzilla.redhat.com/1224290): peers connected in the middle of a transaction are participating in the transaction -- [#1224596](https://bugzilla.redhat.com/1224596): [RFE] Provide hourly scrubbing option -- [#1224600](https://bugzilla.redhat.com/1224600): [RFE] Move signing trigger mechanism to [f]setxattr() -- [#1224611](https://bugzilla.redhat.com/1224611): Skip zero byte files when triggering signing -- [#1224857](https://bugzilla.redhat.com/1224857): DHT - rebalance - when any brick/sub-vol is down and rebalance is not performing any action(fixing lay-out or migrating data) it should not say 'Starting rebalance on volume <vol-name> has been successful' . -- [#1225018](https://bugzilla.redhat.com/1225018): Scripts/Binaries are not installed with +x bit -- [#1225323](https://bugzilla.redhat.com/1225323): Glusterfs client crash during fd migration after graph switch -- [#1225328](https://bugzilla.redhat.com/1225328): afr: unrecognized option in re-balance volfile -- [#1225330](https://bugzilla.redhat.com/1225330): tiering: tier daemon not restarting during volume/glusterd restart -- [#1225424](https://bugzilla.redhat.com/1225424): [Backup]: Misleading error message when glusterfind delete is given with non-existent volume -- [#1225465](https://bugzilla.redhat.com/1225465): [Backup]: Glusterfind session entry persists even after volume is deleted -- [#1225491](https://bugzilla.redhat.com/1225491): [AFR-V2] - afr_final_errno() should treat op_ret > 0 also as success -- [#1225542](https://bugzilla.redhat.com/1225542): [geo-rep]: snapshot creation timesout even if geo-replication is in pause/stop/delete state -- [#1225564](https://bugzilla.redhat.com/1225564): [Backup]: RFE - Glusterfind CLI commands need to respond based on volume's start/stop state -- [#1225566](https://bugzilla.redhat.com/1225566): [geo-rep]: Traceback "ValueError: filedescriptor out of range in select()" observed while creating huge set of data on master -- [#1225571](https://bugzilla.redhat.com/1225571): [geo-rep]: client-rpc-fops.c:172:client3_3_symlink_cbk can be handled better/or ignore these messages in the slave cluster log -- [#1225572](https://bugzilla.redhat.com/1225572): nfs-ganesha: Getting issues for nfs-ganesha on new nodes of glusterfs,error is /etc/ganesha/ganesha-ha.conf: line 11: VIP_<hostname with fqdn>=<ip>: command not found -- [#1225716](https://bugzilla.redhat.com/1225716): tests : remove brick command execution displays success even after, one of the bricks down. -- [#1225793](https://bugzilla.redhat.com/1225793): Spurious failure in tests/bugs/disperse/bug-1161621.t -- [#1226005](https://bugzilla.redhat.com/1226005): should not spawn another migration daemon on graph switch -- [#1226223](https://bugzilla.redhat.com/1226223): Mount broker user add command removes existing volume for a mountbroker user when second volume is attached to same user -- [#1226253](https://bugzilla.redhat.com/1226253): gluster volume heal info crashes -- [#1226276](https://bugzilla.redhat.com/1226276): Volume heal info not reporting files in split brain and core dumping, after upgrading to 3.7.0 -- [#1226279](https://bugzilla.redhat.com/1226279): GF_CONTENT_KEY should not be handled unless we are sure no other operations are in progress -- [#1226307](https://bugzilla.redhat.com/1226307): Volume start fails when glusterfs is source compiled with GCC v5.1.1 -- [#1226367](https://bugzilla.redhat.com/1226367): bug-973073.t fails spuriously -- [#1226384](https://bugzilla.redhat.com/1226384): build: xlators/mgmt/glusterd/src/glusterd-errno.h is not in dist tarball -- [#1226507](https://bugzilla.redhat.com/1226507): Honour afr self-heal volume set options from clients -- [#1226551](https://bugzilla.redhat.com/1226551): libglusterfs: Copy _all_ members of gf_dirent_t in entry_copy() -- [#1226714](https://bugzilla.redhat.com/1226714): auth_cache_entry structure barely gets cached -- [#1226717](https://bugzilla.redhat.com/1226717): racy condition in nfs/auth-cache feature -- [#1226829](https://bugzilla.redhat.com/1226829): gf_store_save_value fails to check for errors, leading to emptying files in /var/lib/glusterd/ -- [#1226881](https://bugzilla.redhat.com/1226881): tiering:compiler warning with gcc v5.1.1 -- [#1226902](https://bugzilla.redhat.com/1226902): bitrot: scrubber is crashing while user set any scrubber tunable value. -- [#1227204](https://bugzilla.redhat.com/1227204): glusterfsd: bricks crash while executing ls on nfs-ganesha vers=3 -- [#1227449](https://bugzilla.redhat.com/1227449): Fix deadlock in timer-wheel del_timer() API -- [#1227583](https://bugzilla.redhat.com/1227583): [Virt-RHGS] Creating a image on gluster volume using qemu-img + gfapi throws error messages related to rpc_transport -- [#1227590](https://bugzilla.redhat.com/1227590): bug-857330/xml.t fails spuriously -- [#1227624](https://bugzilla.redhat.com/1227624): tests/geo-rep: Existing geo-rep regressino test suite is time consuming. -- [#1227646](https://bugzilla.redhat.com/1227646): Glusterd fails to start after volume restore, tier attach and node reboot -- [#1227654](https://bugzilla.redhat.com/1227654): linux untar hanged after the bricks are up in a 8+4 config -- [#1227667](https://bugzilla.redhat.com/1227667): Minor improvements and code cleanup for protocol server/client -- [#1227803](https://bugzilla.redhat.com/1227803): tiering: tier status shows as " progressing " but there is no rebalance daemon running -- [#1227884](https://bugzilla.redhat.com/1227884): Update gluster op version to 30702 -- [#1227894](https://bugzilla.redhat.com/1227894): Increment op-version requirement for lookup-optimize configuration option -- [#1227904](https://bugzilla.redhat.com/1227904): Memory leak in marker xlator -- [#1227996](https://bugzilla.redhat.com/1227996): Objects are not signed upon truncate() -- [#1228093](https://bugzilla.redhat.com/1228093): Glusterd crash -- [#1228111](https://bugzilla.redhat.com/1228111): [Backup]: Crash observed when glusterfind pre is run after deleting a directory containing files -- [#1228112](https://bugzilla.redhat.com/1228112): tiering:glusterd crashed when trying to detach-tier commit force on a non-tiered volume. -- [#1228157](https://bugzilla.redhat.com/1228157): Provide and use a common way to do reference counting of (internal) structures -- [#1228415](https://bugzilla.redhat.com/1228415): Not able to export volume using nfs-ganesha -- [#1228492](https://bugzilla.redhat.com/1228492): [geo-rep]: RENAME are not synced to slave when quota is enabled. -- [#1228613](https://bugzilla.redhat.com/1228613): [Snapshot] Python crashes with trace back notification when shared storage is unmount from Storage Node -- [#1228635](https://bugzilla.redhat.com/1228635): Do not invoke glfs_fini for glfs-heal processes. -- [#1228680](https://bugzilla.redhat.com/1228680): bitrot: (rfe) object signing wait time value should be tunable. -- [#1228696](https://bugzilla.redhat.com/1228696): geo-rep: gverify.sh throws error if slave_host entry is not added to know_hosts file -- [#1228731](https://bugzilla.redhat.com/1228731): nfs-ganesha: rmdir logs "remote operation failed: Stale file handle" even though the operation is successful -- [#1228818](https://bugzilla.redhat.com/1228818): Add documentation for lookup-optimize configuration option in DHT -- [#1228952](https://bugzilla.redhat.com/1228952): Disperse volume : glusterfs crashed -- [#1229127](https://bugzilla.redhat.com/1229127): afr: Correction to self-heal-daemon documentation -- [#1229134](https://bugzilla.redhat.com/1229134): [Bitrot] Gluster v set <volname> bitrot enable command succeeds , which is not supported to enable bitrot -- [#1229139](https://bugzilla.redhat.com/1229139): glusterd: glusterd crashing if you run re-balance and vol status command parallely. -- [#1229172](https://bugzilla.redhat.com/1229172): [AFR-V2] - Fix shd coredump from tests/bugs/glusterd/bug-948686.t -- [#1229297](https://bugzilla.redhat.com/1229297): [Quota] : Inode quota spurious failure -- [#1229609](https://bugzilla.redhat.com/1229609): Quota: " E [quota.c:1197:quota_check_limit] 0-ecvol-quota: Failed to check quota size limit" in brick logs -- [#1229639](https://bugzilla.redhat.com/1229639): build: fix gitclean target -- [#1229658](https://bugzilla.redhat.com/1229658): STACK_RESET may crash with concurrent statedump requests to a glusterfs process -- [#1229825](https://bugzilla.redhat.com/1229825): Add regression test for cluster lock in a heterogeneous cluster -- [#1229860](https://bugzilla.redhat.com/1229860): context of access control translator should be updated properly for GF_POSIX_ACL_*_KEY xattrs -- [#1229948](https://bugzilla.redhat.com/1229948): Ganesha-ha.sh cluster setup not working with RHEL7 and derivatives -- [#1230007](https://bugzilla.redhat.com/1230007): [Backup]: 'New' as well as 'Modify' entry getting recorded for a newly created hardlink -- [#1230015](https://bugzilla.redhat.com/1230015): [Backup]: Glusterfind pre fails with htime xattr updation error resulting in historical changelogs not available -- [#1230017](https://bugzilla.redhat.com/1230017): [Backup]: 'Glusterfind list' should display an appropriate output when there are no active sessions -- [#1230090](https://bugzilla.redhat.com/1230090): [geo-rep]: use_meta_volume config option should be validated for its values -- [#1230111](https://bugzilla.redhat.com/1230111): [Backup]: Glusterfind delete does not delete the session related information present in $GLUSTERD_WORKDIR -- [#1230121](https://bugzilla.redhat.com/1230121): [glusterd] glusterd crashed while trying to remove a bricks - one selected from each replica set - after shrinking nX3 to nX2 to nX1 -- [#1230127](https://bugzilla.redhat.com/1230127): [Backup]: Chown/chgrp for a directory does not get recorded as a MODIFY entry in the outfile -- [#1230647](https://bugzilla.redhat.com/1230647): Disperse volume : client crashed while running IO -- [#1231132](https://bugzilla.redhat.com/1231132): Detect and send ENOTSUP if upcall feature is not enabled -- [#1231197](https://bugzilla.redhat.com/1231197): Snapshot daemon failed to run on newly created dist-rep volume with uss enabled -- [#1231205](https://bugzilla.redhat.com/1231205): [geo-rep]: rsync should be made dependent package for geo-replication -- [#1231257](https://bugzilla.redhat.com/1231257): nfs-ganesha: trying to bring up nfs-ganesha on three node shows error although pcs status and ganesha process on all three nodes -- [#1231264](https://bugzilla.redhat.com/1231264): DHT : for many operation directory/file path is '(null)' in brick log -- [#1231268](https://bugzilla.redhat.com/1231268): Fix invalid logic in tier.t -- [#1231425](https://bugzilla.redhat.com/1231425): use after free bug in dht -- [#1231437](https://bugzilla.redhat.com/1231437): Rebalance is failing in test cluster framework. -- [#1231617](https://bugzilla.redhat.com/1231617): Scrubber crash upon pause -- [#1231619](https://bugzilla.redhat.com/1231619): BitRot :- Handle brick re-connection sanely in bitd/scrub process -- [#1231620](https://bugzilla.redhat.com/1231620): scrub frequecny and throttle change information need to be present in Scrubber log -- [#1231738](https://bugzilla.redhat.com/1231738): nfs-ganesha: volume is not in list of exports in case of volume stop followed by volume start -- [#1231789](https://bugzilla.redhat.com/1231789): Not able to create snapshots for geo-replicated volumes when session is created with root user -- [#1231876](https://bugzilla.redhat.com/1231876): Snapshot: When Cluster.enable-shared-storage is enable, shared storage should get mount after Node reboot -- [#1232001](https://bugzilla.redhat.com/1232001): nfs-ganesha: 8 node pcs cluster setup fails -- [#1232165](https://bugzilla.redhat.com/1232165): NFS Authentication Performance Issue -- [#1232172](https://bugzilla.redhat.com/1232172): Disperse volume : 'ls -ltrh' doesn't list correct size of the files every time -- [#1232183](https://bugzilla.redhat.com/1232183): cli correction: if tried to create multiple bricks on same server shows replicate volume instead of disperse volume -- [#1232238](https://bugzilla.redhat.com/1232238): [RHEV-RHGS] After self-heal operation, VM Image file loses the sparseness property -- [#1232304](https://bugzilla.redhat.com/1232304): libglusterfs: delete duplicate code in libglusterfs/src/dict.c -- [#1232378](https://bugzilla.redhat.com/1232378): [remove-brick]: Creation of file from NFS writes to the decommissioned subvolume and subsequent lookup from fuse creates a link -- [#1232391](https://bugzilla.redhat.com/1232391): Sharding - Use (f)xattrop (as opposed to (f)setxattr) to update shard size and block count -- [#1232430](https://bugzilla.redhat.com/1232430): [SNAPSHOT] : Snapshot delete fails with error - Snap might not be in an usable state -- [#1232572](https://bugzilla.redhat.com/1232572): quota: quota list displays double the size of previous value, post heal completion. -- [#1232658](https://bugzilla.redhat.com/1232658): Change default values of allow-insecure and bind-insecure -- [#1232666](https://bugzilla.redhat.com/1232666): [geo-rep]: Segmentation faults are observed on all the master nodes -- [#1232678](https://bugzilla.redhat.com/1232678): Disperse volume : data corruption with appending writes in 8+4 config -- [#1232686](https://bugzilla.redhat.com/1232686): quorum calculation might go for toss for a concurrent peer probe command -- [#1232693](https://bugzilla.redhat.com/1232693): glusterd crashed when testing heal full on replaced disks -- [#1232729](https://bugzilla.redhat.com/1232729): [Backup]: Glusterfind session(s) created before starting the volume results in 'changelog not available' error, eventually -- [#1232912](https://bugzilla.redhat.com/1232912): [geo-rep]: worker died with "ESTALE" when performed rm -rf on a directory from mount of master volume -- [#1233018](https://bugzilla.redhat.com/1233018): tests: Add the command being 'TEST'ed in all gluster logs -- [#1233139](https://bugzilla.redhat.com/1233139): Null pointer dreference in dht_migrate_complete_check_task -- [#1233151](https://bugzilla.redhat.com/1233151): rm command fails with "Transport end point not connected" during add brick -- [#1233162](https://bugzilla.redhat.com/1233162): [Quota] The root of the volume on which the quota is set shows the volume size more than actual volume size, when checked with "df" command. -- [#1233246](https://bugzilla.redhat.com/1233246): nfs-ganesha: add node fails to add a new node to the cluster -- [#1233258](https://bugzilla.redhat.com/1233258): Possible double execution of the state machine for fops that start other subfops -- [#1233411](https://bugzilla.redhat.com/1233411): [geo-rep]: UnboundLocalError: local variable 'fd' referenced before assignment -- [#1233544](https://bugzilla.redhat.com/1233544): gluster v set help needs to be updated for cluster.enable-shared-storage option -- [#1233617](https://bugzilla.redhat.com/1233617): Introduce an ATOMIC_WRITE flag in posix writev -- [#1233624](https://bugzilla.redhat.com/1233624): nfs-ganesha: ganesha-ha.sh --refresh-config not working -- [#1234286](https://bugzilla.redhat.com/1234286): changelog: directory renames not getting recorded -- [#1234474](https://bugzilla.redhat.com/1234474): nfs-ganesha:delete node throws error and pcs status also notifies about failures, in fact I/O also doesn't resume post grace period -- [#1234694](https://bugzilla.redhat.com/1234694): [geo-rep]: Setting meta volume config to false when meta volume is stopped/deleted leads geo-rep to faulty -- [#1234819](https://bugzilla.redhat.com/1234819): glusterd: glusterd crashes while importing a USS enabled volume which is already started -- [#1234842](https://bugzilla.redhat.com/1234842): GlusterD does not store updated peerinfo objects. -- [#1234882](https://bugzilla.redhat.com/1234882): [geo-rep]: Feature fan-out fails with the use of meta volume config -- [#1235007](https://bugzilla.redhat.com/1235007): Allow only lookup and delete operation on file that is in split-brain -- [#1235195](https://bugzilla.redhat.com/1235195): quota: marker accounting miscalculated when renaming a file on with write is in progress -- [#1235216](https://bugzilla.redhat.com/1235216): tar on a glusterfs mount displays "file changed as we read it" even though the file was not changed -- [#1235231](https://bugzilla.redhat.com/1235231): unix domain sockets on Gluster/NFS are created as fifo/pipe -- [#1235246](https://bugzilla.redhat.com/1235246): Missing trusted.ec.config xattr for files after heal process -- [#1235269](https://bugzilla.redhat.com/1235269): Data Tiering: Files not getting promoted once demoted -- [#1235292](https://bugzilla.redhat.com/1235292): [geo-rep]: set_geo_rep_pem_keys.sh needs modification in gluster path to support mount broker functionality -- [#1235359](https://bugzilla.redhat.com/1235359): [geo-rep]: Mountbroker setup goes to Faulty with ssh 'Permission Denied' Errors -- [#1235538](https://bugzilla.redhat.com/1235538): Porting the left out gf_log messages to the new logging API -- [#1235542](https://bugzilla.redhat.com/1235542): Upcall: Directory or file creation should send cache invalidation requests to parent directories -- [#1235582](https://bugzilla.redhat.com/1235582): snapd crashed due to stack overflow -- [#1235751](https://bugzilla.redhat.com/1235751): peer probe results in Peer Rejected(Connected) -- [#1235921](https://bugzilla.redhat.com/1235921): POSIX: brick logs filled with _gf_log_callingfn due to this==NULL in dict_get -- [#1235927](https://bugzilla.redhat.com/1235927): memory corruption in the way we maintain migration information in inodes. -- [#1235989](https://bugzilla.redhat.com/1235989): Do null check before dict_ref -- [#1236009](https://bugzilla.redhat.com/1236009): do an explicit lookup on the inodes linked in readdirp -- [#1236032](https://bugzilla.redhat.com/1236032): Tiering: unlink failed with error "Invalid argument" -- [#1236065](https://bugzilla.redhat.com/1236065): Disperse volume: FUSE I/O error after self healing the failed disk files -- [#1236128](https://bugzilla.redhat.com/1236128): Quota list is not working on tiered volume. -- [#1236212](https://bugzilla.redhat.com/1236212): Migration does not work when EC is used as a tiered volume. -- [#1236270](https://bugzilla.redhat.com/1236270): [Backup]: File movement across directories does not get captured in the output file in a X3 volume -- [#1236512](https://bugzilla.redhat.com/1236512): DHT + rebalance :- file permission got changed (sticky bit and setgid is set) after file migration failure -- [#1236561](https://bugzilla.redhat.com/1236561): Ganesha volume export failed -- [#1236945](https://bugzilla.redhat.com/1236945): glusterfsd crashed while rebalance and self-heal were in progress -- [#1237000](https://bugzilla.redhat.com/1237000): Add a test case for verifying NO empty changelog created -- [#1237174](https://bugzilla.redhat.com/1237174): Incorrect state created in '/var/lib/nfs/statd' -- [#1237381](https://bugzilla.redhat.com/1237381): Throttle background heals in disperse volumes -- [#1238054](https://bugzilla.redhat.com/1238054): Consecutive volume start/stop operations when ganesha.enable is on, leads to errors -- [#1238063](https://bugzilla.redhat.com/1238063): libgfchangelog: Example programs are not working. -- [#1238072](https://bugzilla.redhat.com/1238072): protocol/server doesn't reconfigure auth.ssl-allow options -- [#1238135](https://bugzilla.redhat.com/1238135): Initialize daemons on demand -- [#1238188](https://bugzilla.redhat.com/1238188): Not able to recover the corrupted file on Replica volume -- [#1238224](https://bugzilla.redhat.com/1238224): setting enable-shared-storage without mentioning the domain, doesn't enables shared storage -- [#1238508](https://bugzilla.redhat.com/1238508): Renamed Files are missing after self-heal -- [#1238593](https://bugzilla.redhat.com/1238593): tiering/snapshot: Tier daemon failed to start during volume start after restoring into a tiered volume from a non-tiered volume. -- [#1238661](https://bugzilla.redhat.com/1238661): When bind-insecure is enabled, bricks may not be able to bind to port assigned by Glusterd -- [#1238747](https://bugzilla.redhat.com/1238747): Crash in Quota enforcer -- [#1238788](https://bugzilla.redhat.com/1238788): Fix build on Mac OS X, header guard macros -- [#1238791](https://bugzilla.redhat.com/1238791): Fix build on Mac OS X, gfapi symbol versions -- [#1238793](https://bugzilla.redhat.com/1238793): Fix build on Mac OS X, timerwheel spinlock -- [#1238796](https://bugzilla.redhat.com/1238796): Fix build on Mac OS X, configure(.ac) -- [#1238798](https://bugzilla.redhat.com/1238798): Fix build on Mac OS X, ACLs -- [#1238936](https://bugzilla.redhat.com/1238936): 'unable to get transaction op-info' error seen in glusterd log while executing gluster volume status command -- [#1238952](https://bugzilla.redhat.com/1238952): gf_msg_callingfn does not log the callers of the function in which it is called -- [#1239037](https://bugzilla.redhat.com/1239037): disperse: Wrong values for "cluster.heal-timeout" could be assigned using CLI -- [#1239044](https://bugzilla.redhat.com/1239044): [geo-rep]: killing brick from replica pair makes geo-rep session faulty with Traceback "ChangelogException" -- [#1239269](https://bugzilla.redhat.com/1239269): [Scheduler]: Unable to create Snapshots on RHEL-7.1 using Scheduler -- [#1240161](https://bugzilla.redhat.com/1240161): glusterfsd crashed after volume start force -- [#1240184](https://bugzilla.redhat.com/1240184): snap-view:mount crash if debug mode is enabled -- [#1240210](https://bugzilla.redhat.com/1240210): Metadata self-heal is not handling failures while heal properly -- [#1240218](https://bugzilla.redhat.com/1240218): Scrubber log should mark file corrupted message as Alert not as information -- [#1240219](https://bugzilla.redhat.com/1240219): Scrubber log should mark file corrupted message as Alert not as information -- [#1240229](https://bugzilla.redhat.com/1240229): Unable to pause georep session if one of the nodes in cluster is not part of master volume. -- [#1240244](https://bugzilla.redhat.com/1240244): Unable to examine file in metadata split-brain after setting `replica.split-brain-choice' attribute to a particular replica -- [#1240254](https://bugzilla.redhat.com/1240254): quota+afr: quotad crash "afr_local_init (local=0x0, priv=0x7fddd0372220, op_errno=0x7fddce1434dc) at afr-common.c:4112" -- [#1240284](https://bugzilla.redhat.com/1240284): Disperse volume: NFS crashed -- [#1240564](https://bugzilla.redhat.com/1240564): Gluster commands timeout on SSL enabled system, after adding new node to trusted storage pool -- [#1240577](https://bugzilla.redhat.com/1240577): Data Tiering: Database locks observed on tiered volumes on continous writes to a file -- [#1240581](https://bugzilla.redhat.com/1240581): quota/marker: marker code cleanup -- [#1240598](https://bugzilla.redhat.com/1240598): quota/marker: lk_owner is null while acquiring inodelk in rename operation -- [#1240621](https://bugzilla.redhat.com/1240621): tiering: Tier daemon stopped prior to graph switch. -- [#1240654](https://bugzilla.redhat.com/1240654): quota: allowed to set soft-limit %age beyond 100% -- [#1240916](https://bugzilla.redhat.com/1240916): glfs_loc_link: Update loc.inode with the existing inode incase if already exits -- [#1240949](https://bugzilla.redhat.com/1240949): quota: In enforcer, caching parents in ctx during build ancestry is not working -- [#1240952](https://bugzilla.redhat.com/1240952): [USS]: snapd process is not killed once the glusterd comes back -- [#1240970](https://bugzilla.redhat.com/1240970): [Data Tiering]: HOT Files get demoted from hot tier -- [#1240991](https://bugzilla.redhat.com/1240991): Quota: After rename operation , gluster v quota <volname> list-objects command give incorrect no. of files in output -- [#1241054](https://bugzilla.redhat.com/1241054): Data Tiering: Rename of file is not heating up the file -- [#1241071](https://bugzilla.redhat.com/1241071): Spurious failure of ./tests/bugs/snapshot/bug-1109889.t -- [#1241104](https://bugzilla.redhat.com/1241104): Handle negative fcntl flock->l_len values -- [#1241133](https://bugzilla.redhat.com/1241133): nfs-ganesha: execution of script ganesha-ha.sh throws a error for a file -- [#1241153](https://bugzilla.redhat.com/1241153): quota: marker accounting can get miscalculated after upgrade to 3.7 -- [#1241274](https://bugzilla.redhat.com/1241274): Peer not recognized after IP address change -- [#1241379](https://bugzilla.redhat.com/1241379): Reduce 'CTR disabled' brick log message from ERROR to INFO/DEBUG -- [#1241480](https://bugzilla.redhat.com/1241480): ganesha volume export fails in rhel7.1 -- [#1241788](https://bugzilla.redhat.com/1241788): syncop:Include iatt to 'syncop_link' args -- [#1241882](https://bugzilla.redhat.com/1241882): GlusterD cannot restart after being probed into a cluster. -- [#1241895](https://bugzilla.redhat.com/1241895): nfs-ganesha: add-node logic does not copy the "/etc/ganesha/exports" directory to the correct path on the newly added node -- [#1242030](https://bugzilla.redhat.com/1242030): nfs-ganesha: bricks crash while executing acl related operation for named group/user -- [#1242041](https://bugzilla.redhat.com/1242041): nfs-ganesha : Multiple setting of nfs4_acl on a same file will cause brick crash -- [#1242254](https://bugzilla.redhat.com/1242254): fops fail with EIO on nfs mount after add-brick and rebalance -- [#1242280](https://bugzilla.redhat.com/1242280): Handle all errors equal in dict_set_bin() -- [#1242317](https://bugzilla.redhat.com/1242317): [RFE] Improve I/O latency during signing -- [#1242333](https://bugzilla.redhat.com/1242333): rdma : pending - porting log messages to a new framework -- [#1242421](https://bugzilla.redhat.com/1242421): Enable multi-threaded epoll for glusterd process -- [#1242504](https://bugzilla.redhat.com/1242504): [Data Tiering]: Frequency Counters of un-selected file in the DB wont get clear after a promotion/demotion cycle -- [#1242570](https://bugzilla.redhat.com/1242570): GlusterD crashes when management encryption is enabled -- [#1242609](https://bugzilla.redhat.com/1242609): replacing a offline brick fails with "replace-brick" command -- [#1242742](https://bugzilla.redhat.com/1242742): Gluster peer probe with negative num -- [#1242809](https://bugzilla.redhat.com/1242809): Performance: Impact of Bitrot on I/O Performance -- [#1242819](https://bugzilla.redhat.com/1242819): Quota list on a volume hangs after glusterd restart an a node. -- [#1242875](https://bugzilla.redhat.com/1242875): Quota: Quota Daemon doesn't start after node reboot -- [#1242892](https://bugzilla.redhat.com/1242892): SMB: share entry from smb.conf is not removed after setting user.cifs and user.smb to disable. -- [#1242894](https://bugzilla.redhat.com/1242894): [RFE] 'gluster volume help' output could be sorted alphabetically -- [#1243108](https://bugzilla.redhat.com/1243108): bash tab completion fails with "grep: Invalid range end" -- [#1243187](https://bugzilla.redhat.com/1243187): Disperse volume : client glusterfs crashed while running IO -- [#1243382](https://bugzilla.redhat.com/1243382): EC volume: Replace bricks is not healing version of root directory -- [#1243391](https://bugzilla.redhat.com/1243391): fail the fops if inode context get fails -- [#1243753](https://bugzilla.redhat.com/1243753): Gluster cli logs invalid argument error on every gluster command execution -- [#1243774](https://bugzilla.redhat.com/1243774): glusterd crashed when a client which doesn't support SSL tries to mount a SSL enabled gluster volume -- [#1243785](https://bugzilla.redhat.com/1243785): [Backup]: Password of the peer nodes prompted whenever a glusterfind session is deleted. -- [#1243798](https://bugzilla.redhat.com/1243798): quota/marker: dir count in inode quota is not atomic -- [#1243805](https://bugzilla.redhat.com/1243805): Gluster-nfs : unnecessary logging message in nfs.log for export feature -- [#1243806](https://bugzilla.redhat.com/1243806): logging: Revert usage of global xlator for log buffer -- [#1243812](https://bugzilla.redhat.com/1243812): [Backup]: Crash observed when keyboard interrupt is encountered in the middle of any glusterfind command -- [#1243838](https://bugzilla.redhat.com/1243838): [Backup]: Glusterfind list shows the session as corrupted on the peer node -- [#1243890](https://bugzilla.redhat.com/1243890): huge mem leak in posix xattrop -- [#1243946](https://bugzilla.redhat.com/1243946): RFE: posix: xattrop 'GF_XATTROP_ADD_DEF_ARRAY' implementation -- [#1244109](https://bugzilla.redhat.com/1244109): quota: brick crashes when create and remove performed in parallel -- [#1244144](https://bugzilla.redhat.com/1244144): [Backup]: Glusterfind pre attribute '--output-prefix' not working as expected in case of DELETEs -- [#1244165](https://bugzilla.redhat.com/1244165): [RHEV-RHGS] App VMs paused due to IO error caused by split-brain, after initiating remove-brick operation -- [#1244613](https://bugzilla.redhat.com/1244613): using fop's dict for resolving causes problems -- [#1245045](https://bugzilla.redhat.com/1245045): Data Loss:Remove brick commit passing when remove-brick process has not even started(due to killing glusterd) -- [#1245065](https://bugzilla.redhat.com/1245065): "rm -rf *" from multiple mount points fails to remove directories on all the subvolumes -- [#1245142](https://bugzilla.redhat.com/1245142): DHT-rebalance: Rebalance hangs on distribute volume when glusterd is stopped on peer node -- [#1245276](https://bugzilla.redhat.com/1245276): ec returns EIO error in cases where a more specific error could be returned -- [#1245331](https://bugzilla.redhat.com/1245331): volume start command is failing when glusterfs compiled with debug enabled -- [#1245380](https://bugzilla.redhat.com/1245380): [RFE] Render all mounts of a volume defunct upon access revocation -- [#1245425](https://bugzilla.redhat.com/1245425): IFS is not set back after used as "[" in log_newer function of include.rc -- [#1245544](https://bugzilla.redhat.com/1245544): quota/marker: errors in log file 'Failed to get metadata for' -- [#1245547](https://bugzilla.redhat.com/1245547): sharding - Fix unlink of sparse files -- [#1245558](https://bugzilla.redhat.com/1245558): gluster vol quota dist-vol list is not displaying quota informatio. -- [#1245689](https://bugzilla.redhat.com/1245689): ec sequentializes all reads, limiting read throughtput -- [#1245895](https://bugzilla.redhat.com/1245895): gluster snapshot status --xml gives back unexpected non xml output -- [#1245935](https://bugzilla.redhat.com/1245935): Data Tiering: Change the error message when a detach-tier status is issued on a non-tier volume -- [#1245981](https://bugzilla.redhat.com/1245981): forgotten inodes are not being signed -- [#1246052](https://bugzilla.redhat.com/1246052): Deceiving log messages like "Failing STAT on gfid : split-brain observed. [Input/output error]" reported -- [#1246082](https://bugzilla.redhat.com/1246082): sharding - Populate the aggregated ia_size and ia_blocks before unwinding (f)setattr to upper layers -- [#1246229](https://bugzilla.redhat.com/1246229): tier_lookup_heal.t contains incorrect file_on_fast_tier function -- [#1246275](https://bugzilla.redhat.com/1246275): POSIX ACLs as used by a FUSE mount can not use more than 32 groups -- [#1246432](https://bugzilla.redhat.com/1246432): ./tests/basic/volume-snapshot.t spurious fail causing glusterd crash. -- [#1246736](https://bugzilla.redhat.com/1246736): client3_3_removexattr_cbk floods the logs with "No data available" messages -- [#1246794](https://bugzilla.redhat.com/1246794): GF_LOG_NONE logs always -- [#1247108](https://bugzilla.redhat.com/1247108): sharding - OS installation on vm image hangs on a sharded volume -- [#1247152](https://bugzilla.redhat.com/1247152): SSL improvements: ECDH, DH, CRL, and accessible options -- [#1247529](https://bugzilla.redhat.com/1247529): [geo-rep]: rename followed by deletes causes ESTALE -- [#1247536](https://bugzilla.redhat.com/1247536): Dist-geo-rep : checkpoint doesn't reach even though all the files have been synced through hybrid crawl. -- [#1247563](https://bugzilla.redhat.com/1247563): ACL created on a dht.linkto file on a files that skipped rebalance -- [#1247603](https://bugzilla.redhat.com/1247603): glusterfs : fix double free possibility in the code -- [#1247765](https://bugzilla.redhat.com/1247765): Glusterfsd crashes because of thread-unsafe code in gf_authenticate -- [#1247930](https://bugzilla.redhat.com/1247930): rpc: check for unprivileged port should start at 1024 and not beyond 1024 -- [#1248298](https://bugzilla.redhat.com/1248298): [upgrade] After upgrade from 3.5 to 3.6 onwards version, bumping up op-version failed -- [#1248306](https://bugzilla.redhat.com/1248306): tiering: rename fails with "Device or resource busy" error message -- [#1248415](https://bugzilla.redhat.com/1248415): rebalance stuck at 0 byte when auth.allow is set -- [#1248521](https://bugzilla.redhat.com/1248521): quota : display the size equivalent to the soft limit percentage in gluster v quota <volname> list* command -- [#1248669](https://bugzilla.redhat.com/1248669): all: Make all the xlator fops static to avoid incorrect symbol resolution -- [#1248887](https://bugzilla.redhat.com/1248887): AFR: Make [f]xattrop metadata transaction -- [#1249391](https://bugzilla.redhat.com/1249391): Fix build on Mac OS X, booleans -- [#1249499](https://bugzilla.redhat.com/1249499): Make ping-timeout option configurable at a volume-level -- [#1250009](https://bugzilla.redhat.com/1250009): Dist-geo-rep: Too many "remote operation failed: No such file or directory" warning messages in auxilary mount log on slave while executing "rm -rf" -- [#1250170](https://bugzilla.redhat.com/1250170): Write performance from a Windows client on 3-way replicated volume decreases substantially when one brick in the replica set is brought down -- [#1250297](https://bugzilla.redhat.com/1250297): [New] - glusterfs dead when user creates a rdma volume -- [#1250387](https://bugzilla.redhat.com/1250387): [RFE] changes needed in snapshot info command's xml output. -- [#1250441](https://bugzilla.redhat.com/1250441): Sharding - Excessive logging of messages of the kind 'Failed to get trusted.glusterfs.shard.file-size for bf292f5b-6dd6-45a8-b03c-aaf5bb973c50' -- [#1250582](https://bugzilla.redhat.com/1250582): Quota: volume-reset shouldn't remove quota-deem-statfs, unless explicitly specified, when quota is enabled. -- [#1250601](https://bugzilla.redhat.com/1250601): nfs-ganesha: remove the entry of the deleted node -- [#1250628](https://bugzilla.redhat.com/1250628): nfs-ganesha: ganesha-ha.sh --status is actually same as "pcs status" -- [#1250797](https://bugzilla.redhat.com/1250797): rpc: Address issues with transport object reference and leak -- [#1250803](https://bugzilla.redhat.com/1250803): Perf: Metadata operation(ls -l) performance regression. -- [#1250828](https://bugzilla.redhat.com/1250828): Tiering: segfault when trying to rename a file -- [#1250855](https://bugzilla.redhat.com/1250855): sharding - Renames on non-sharded files failing with ENOMEM -- [#1251042](https://bugzilla.redhat.com/1251042): while re-configuring the scrubber frequency, scheduling is not happening based on current time -- [#1251121](https://bugzilla.redhat.com/1251121): Unable to demote files in tiered volumes when cold tier is EC. -- [#1251346](https://bugzilla.redhat.com/1251346): statfs giving incorrect values for AFR arbiter volumes -- [#1251446](https://bugzilla.redhat.com/1251446): Disperse volume: fuse mount hung after self healing -- [#1251449](https://bugzilla.redhat.com/1251449): posix_make_ancestryfromgfid doesn't set op_errno -- [#1251454](https://bugzilla.redhat.com/1251454): marker: set loc.parent if NULL -- [#1251592](https://bugzilla.redhat.com/1251592): Fix the tests infra -- [#1251674](https://bugzilla.redhat.com/1251674): Add known failures to bad_tests list -- [#1251821](https://bugzilla.redhat.com/1251821): /usr/lib/glusterfs/ganesha/ganesha_ha.sh is distro specific -- [#1251824](https://bugzilla.redhat.com/1251824): Sharding - Individual shards' ownership differs from that of the original file -- [#1251857](https://bugzilla.redhat.com/1251857): nfs-ganesha: new volume creation tries to bring up glusterfs-nfs even when nfs-ganesha is already on -- [#1251980](https://bugzilla.redhat.com/1251980): dist-geo-rep: geo-rep status shows Active/Passive even when all the gsync processes in a node are killed -- [#1252121](https://bugzilla.redhat.com/1252121): tier.t contains pattern matching error in check_counters function -- [#1252244](https://bugzilla.redhat.com/1252244): DHT : If Directory creation is in progress and rename of that Directory comes from another mount point then after both operation few files are not accessible and not listed on mount and more than one Directory have same gfid -- [#1252263](https://bugzilla.redhat.com/1252263): Sharding - Send inode forgets on _all_ shards if/when the protocol layer (FUSE/Gfapi) at the top sends a forget on the actual file -- [#1252374](https://bugzilla.redhat.com/1252374): tests: no cleanup on receiving external signals INT, TERM and HUP -- [#1252410](https://bugzilla.redhat.com/1252410): libgfapi : adding follow flag to glfs_h_lookupat() -- [#1252448](https://bugzilla.redhat.com/1252448): Probing a new node, which is part of another cluster, should throw proper error message in logs and CLI -- [#1252586](https://bugzilla.redhat.com/1252586): Legacy files pre-existing tier attach must be promoted -- [#1252695](https://bugzilla.redhat.com/1252695): posix : pending - porting log messages to a new framework -- [#1252696](https://bugzilla.redhat.com/1252696): After resetting diagnostics.client-log-level, still Debug messages are logging in scrubber log -- [#1252737](https://bugzilla.redhat.com/1252737): xml output for volume status on tiered volume -- [#1252807](https://bugzilla.redhat.com/1252807): libgfapi : pending - porting log messages to a new framework -- [#1252808](https://bugzilla.redhat.com/1252808): protocol server : Pending - porting log messages to a new framework -- [#1252825](https://bugzilla.redhat.com/1252825): Though scrubber settings changed on one volume log shows all volumes scrubber information -- [#1252836](https://bugzilla.redhat.com/1252836): libglusterfs: Pending - Porting log messages to new framework -- [#1253149](https://bugzilla.redhat.com/1253149): performance translators: Pending - porting logging messages to new logging framework -- [#1253309](https://bugzilla.redhat.com/1253309): AFR: gluster v restart force or brick process restart doesn't heal the files -- [#1253828](https://bugzilla.redhat.com/1253828): glusterd: remove unused large memory/buffer allocations -- [#1253831](https://bugzilla.redhat.com/1253831): glusterd: clean dead initializations -- [#1253967](https://bugzilla.redhat.com/1253967): glusterfs doesn't include firewalld rules -- [#1253970](https://bugzilla.redhat.com/1253970): garbage files created in /var/run/gluster -- [#1254121](https://bugzilla.redhat.com/1254121): Start self-heal and display correct heal info after replace brick -- [#1254127](https://bugzilla.redhat.com/1254127): Spurious failure blocking NetBSD regression runs -- [#1254146](https://bugzilla.redhat.com/1254146): quota: numbers of warning messages in nfs.log a single file itself -- [#1254167](https://bugzilla.redhat.com/1254167): `gluster volume heal <vol-name> split-brain' changes required for entry-split-brain -- [#1254428](https://bugzilla.redhat.com/1254428): Data Tiering : Writes to a file being promoted/demoted are missing once the file migration is complete -- [#1254451](https://bugzilla.redhat.com/1254451): Data Tiering : Some tier xlator_fops translate to the default fops -- [#1254494](https://bugzilla.redhat.com/1254494): nfs-ganesha: refresh-config stdout output does not make sense -- [#1254850](https://bugzilla.redhat.com/1254850): Fix build on Mac OS X, glfs_h_lookupat symbol version -- [#1254863](https://bugzilla.redhat.com/1254863): non-default symver macros are incorrect -- [#1254934](https://bugzilla.redhat.com/1254934): Misleading error messages on brick logs while creating directory (mkdir) on fuse mount -- [#1255310](https://bugzilla.redhat.com/1255310): Snapshot: When soft limit is reached, auto-delete is enable, create snapshot doesn't logs anything in log files -- [#1255386](https://bugzilla.redhat.com/1255386): snapd/quota/nfs daemon's runs on the node, even after that node was detached from trusted storage pool -- [#1255599](https://bugzilla.redhat.com/1255599): Remove unwanted tests from volume-snapshot.t -- [#1255693](https://bugzilla.redhat.com/1255693): Tiering status command is very cumbersome. -- [#1255694](https://bugzilla.redhat.com/1255694): glusterd: volume status backward compatibility -- [#1256243](https://bugzilla.redhat.com/1256243): remove-brick: avoid mknod op falling on decommissioned brick even after fix-layout has happened on parent directory -- [#1256352](https://bugzilla.redhat.com/1256352): gluster-nfs : contents of export file is not updated correctly in its context -- [#1256580](https://bugzilla.redhat.com/1256580): sharding - VM image size as seen from the mount keeps growing beyond configured size on a sharded volume -- [#1256588](https://bugzilla.redhat.com/1256588): arbiter-statfs.t fails spuriously in NetBSD regression -- [#1257076](https://bugzilla.redhat.com/1257076): DHT-rebalance: rebalance status shows failed when replica pair bricks are brought down in distrep volume while re-name of files going on -- [#1257110](https://bugzilla.redhat.com/1257110): Logging : unnecessary log message "REMOVEXATTR No data available " when files are written to glusterfs mount -- [#1257149](https://bugzilla.redhat.com/1257149): Provide more meaningful errors on peer probe and peer detach -- [#1257533](https://bugzilla.redhat.com/1257533): snapshot delete all command fails with --xml option. -- [#1257694](https://bugzilla.redhat.com/1257694): quota: removexattr on /d/backends/patchy/.glusterfs/79/99/799929ec-f546-4bbf-8549-801b79623262 (for trusted.glusterfs.quota.add7e3f8-833b-48ec-8a03-f7cd09925468.contri) [No such file or directory] -- [#1257709](https://bugzilla.redhat.com/1257709): Copy NFS-Ganesha export files as part of volume snapshot creation -- [#1257792](https://bugzilla.redhat.com/1257792): bug-1238706-daemons-stop-on-peer-cleanup.t fails occasionally -- [#1257847](https://bugzilla.redhat.com/1257847): Dist-geo-rep: Geo-replication doesn't work with NetBSD -- [#1257911](https://bugzilla.redhat.com/1257911): add policy mechanism for promotion and demotion -- [#1258196](https://bugzilla.redhat.com/1258196): gNFSd: NFS mount fails with "Remote I/O error" -- [#1258311](https://bugzilla.redhat.com/1258311): trace xlator: Print write size also in trace_writev logs -- [#1258334](https://bugzilla.redhat.com/1258334): Sharding - Unlink of VM images can sometimes fail with EINVAL -- [#1258714](https://bugzilla.redhat.com/1258714): bug-948686.t fails spuriously -- [#1258766](https://bugzilla.redhat.com/1258766): quota test 'quota-nfs.t' fails spuriously -- [#1258801](https://bugzilla.redhat.com/1258801): Change order of marking AFR post op -- [#1258883](https://bugzilla.redhat.com/1258883): build: compile error on RHEL5 -- [#1258905](https://bugzilla.redhat.com/1258905): Sharding - read/write performance improvements for VM workload -- [#1258975](https://bugzilla.redhat.com/1258975): packaging: gluster-server install failure due to %ghost of hooks/.../delete -- [#1259225](https://bugzilla.redhat.com/1259225): Add node of nfs-ganesha not working on rhel7.1 -- [#1259298](https://bugzilla.redhat.com/1259298): Tier xattr name is misleading (trusted.tier-gfid) -- [#1259572](https://bugzilla.redhat.com/1259572): client is sending io to arbiter with replica 2 -- [#1259651](https://bugzilla.redhat.com/1259651): sharding - Fix reads on zero-byte shards representing holes in the file -- [#1260051](https://bugzilla.redhat.com/1260051): DHT: Few files are missing after remove-brick operation -- [#1260147](https://bugzilla.redhat.com/1260147): fuse client crashed during i/o -- [#1260185](https://bugzilla.redhat.com/1260185): Data Tiering:Regression:Commit of detach tier passes without directly without even issuing a detach tier start -- [#1260545](https://bugzilla.redhat.com/1260545): Quota+Rebalance : While rebalance is in progress , quota list shows 'Used Space' more than the Hard Limit set -- [#1260561](https://bugzilla.redhat.com/1260561): transport and port should be optional arguments for glfs_set_volfile_server -- [#1260611](https://bugzilla.redhat.com/1260611): snapshot: from nfs-ganesha mount no content seen in .snaps/<snapshot-name> directory -- [#1260637](https://bugzilla.redhat.com/1260637): sharding - Do not expose internal sharding xattrs to the application. -- [#1260730](https://bugzilla.redhat.com/1260730): Database locking due to write contention between CTR sql connection and tier migrator sql connection -- [#1260848](https://bugzilla.redhat.com/1260848): Disperse volume: df -h on a nfs mount throws Invalid argument error -- [#1260918](https://bugzilla.redhat.com/1260918): [BACKUP]: If more than 1 node in cluster are not added in known_host, glusterfind create command hungs -- [#1261260](https://bugzilla.redhat.com/1261260): [RFE]: Have reads be performed on same bricks for a given file -- [#1261276](https://bugzilla.redhat.com/1261276): Tier/shd: Tracker bug for tier and shd compatibility -- [#1261399](https://bugzilla.redhat.com/1261399): [HC] Fuse mount crashes, when client-quorum is not met -- [#1261404](https://bugzilla.redhat.com/1261404): No quota API to get real hard-limit value. -- [#1261444](https://bugzilla.redhat.com/1261444): cli : volume start will create/overwrite ganesha export file -- [#1261482](https://bugzilla.redhat.com/1261482): glusterd_copy_file can cause file corruption -- [#1261741](https://bugzilla.redhat.com/1261741): Tier: glusterd crash when trying to detach , when hot tier is having exactly one brick and cold tier is of replica type -- [#1261757](https://bugzilla.redhat.com/1261757): Tiering/glusted: volume status failed after detach tier start -- [#1261773](https://bugzilla.redhat.com/1261773): features.sharding is not available in 'gluster volume set help' -- [#1261819](https://bugzilla.redhat.com/1261819): Data Tiering: Disallow attach tier on a volume where any rebalance process is in progress to avoid deadlock(like remove brick commit pending etc) -- [#1261837](https://bugzilla.redhat.com/1261837): Data Tiering:Volume task status showing as remove brick when detach tier is trigger -- [#1261841](https://bugzilla.redhat.com/1261841): [HC] Implement fallocate, discard and zerofill with sharding -- [#1261862](https://bugzilla.redhat.com/1261862): Data Tiering: detach-tier start force command not available on a tier volume(unlike which is possible in force remove-brick) -- [#1261927](https://bugzilla.redhat.com/1261927): Minor improvements and code cleanup for rpc -- [#1262345](https://bugzilla.redhat.com/1262345): `getfattr -n replica.split-brain-status <file>' command hung on the mount -- [#1262438](https://bugzilla.redhat.com/1262438): Error not propagated correctly if selfheal layout lock fails -- [#1262805](https://bugzilla.redhat.com/1262805): [upgrade] After upgrade from 3.5 to 3.6, probing a new 3.6 node is moving the peer to rejected state -- [#1262881](https://bugzilla.redhat.com/1262881): nfs-ganesha: refresh-config stdout output includes dbus messages "method return sender=:1.61 -> dest=:1.65 reply_serial=2" -- [#1263056](https://bugzilla.redhat.com/1263056): libgfapi: brick process crashes if attr KEY length > 255 for glfs_lgetxattr(...) -- [#1263087](https://bugzilla.redhat.com/1263087): RHEL7/systemd : can't have server in debug mode anymore -- [#1263100](https://bugzilla.redhat.com/1263100): Data Tiering: Tiering related information is not displayed in gluster volume status xml output -- [#1263177](https://bugzilla.redhat.com/1263177): Data Tieirng:Change error message as detach-tier error message throws as "remove-brick" -- [#1263204](https://bugzilla.redhat.com/1263204): Data Tiering:Setting only promote frequency and no demote frequency causes crash -- [#1263224](https://bugzilla.redhat.com/1263224): 'gluster v tier/attach-tier/detach-tier help' command shows the usage, and then throws 'Tier command failed' error message -- [#1263549](https://bugzilla.redhat.com/1263549): I/O failure on attaching tier -- [#1263726](https://bugzilla.redhat.com/1263726): Data Tieirng:Detach tier status shows number of failures even when all files are migrated successfully -- [#1265148](https://bugzilla.redhat.com/1265148): Dist-geo-rep: Support geo-replication to work with sharding -- [#1265470](https://bugzilla.redhat.com/1265470): AFR : "gluster volume heal <volume_name info" doesn't report the fqdn of storage nodes. -- [#1265479](https://bugzilla.redhat.com/1265479): AFR: cluster options like data-self-heal, metadata-self-heal and entry-self-heal should not be allowed to set, if volume is not distribute-replicate volume -- [#1265516](https://bugzilla.redhat.com/1265516): sharding - Add more logs in failure code paths + port existing messages to the msg-id framework -- [#1265522](https://bugzilla.redhat.com/1265522): Geo-Replication failes on uppercase hostnames -- [#1265531](https://bugzilla.redhat.com/1265531): Message ids in quota-messages.h should start from 120000 as opposed to 110000 -- [#1265677](https://bugzilla.redhat.com/1265677): Have a way to disable readdirp on dht from glusterd volume set command -- [#1265893](https://bugzilla.redhat.com/1265893): Perf: Getting bad performance while doing ls -- [#1266476](https://bugzilla.redhat.com/1266476): RFE : Feature: Periodic FOP statistics dumps for v3.6.x/v3.7.x -- [#1266818](https://bugzilla.redhat.com/1266818): Disabling enable-shared-storage deletes the volume with the name - "gluster_shared_storage" -- [#1266834](https://bugzilla.redhat.com/1266834): AFR : fuse,nfs mount hangs when directories with same names are created and deleted continuously -- [#1266875](https://bugzilla.redhat.com/1266875): geo-replication: [RFE] Geo-replication + Tiering -- [#1266877](https://bugzilla.redhat.com/1266877): Possible memory leak during rebalance with large quantity of files -- [#1266883](https://bugzilla.redhat.com/1266883): protocol/server: do not define the number of inodes in terms of memory units -- [#1267539](https://bugzilla.redhat.com/1267539): Data Tiering:CLI crashes with segmentation fault when user tries "gluster v tier" command -- [#1267812](https://bugzilla.redhat.com/1267812): Data Tiering:Promotions and demotions fail after quota hard limits are hit for a tier volume -- [#1267950](https://bugzilla.redhat.com/1267950): need a way to pause/stop tiering to take snapshot -- [#1267967](https://bugzilla.redhat.com/1267967): core: use syscall wrappers instead of making direct syscalls -- [#1268755](https://bugzilla.redhat.com/1268755): Data Tiering:Throw a warning when user issues a detach-tier commit command -- [#1268790](https://bugzilla.redhat.com/1268790): Add bug-1221481-allow-fops-on-dir-split-brain.t to bad test -- [#1268796](https://bugzilla.redhat.com/1268796): Test tests/bugs/shard/bug-1245547.t failing consistently when run with patch http://review.gluster.org/#/c/11938/ -- [#1268810](https://bugzilla.redhat.com/1268810): gluster v status --xml for a replicated hot tier volume -- [#1268822](https://bugzilla.redhat.com/1268822): tier/cli: number of bricks remains the same in v info --xml -- [#1269375](https://bugzilla.redhat.com/1269375): rm -rf on /run/gluster/vol/<directory name>/ is not showing quota output header for other quota limit applied directories -- [#1269461](https://bugzilla.redhat.com/1269461): Feature: Entry self-heal performance enhancements using more granular changelogs -- [#1269470](https://bugzilla.redhat.com/1269470): Self-heal daemon crashes when bricks godown at the time of data heal -- [#1269696](https://bugzilla.redhat.com/1269696): Glusterfsd crashes on pmap signin failure -- [#1269754](https://bugzilla.redhat.com/1269754): Core:Blocker:Segmentation fault when using fallocate command on a gluster volume -- [#1270328](https://bugzilla.redhat.com/1270328): Rare transpoint endpoint not connected error in tier.t tests. -- [#1270668](https://bugzilla.redhat.com/1270668): Index entries are not being purged in case of file does not exist -- [#1270694](https://bugzilla.redhat.com/1270694): Introduce priv dump in shard xlator for better debugging -- [#1271148](https://bugzilla.redhat.com/1271148): Tier: Do not promote/demote files on which POSIX locks are held -- [#1271150](https://bugzilla.redhat.com/1271150): libglusterfs : glusterd was not restarting after setting key=value length beyond PATH_MAX (4096) character -- [#1271310](https://bugzilla.redhat.com/1271310): RFE : Feature: Tunable FOP sampling for v3.6.x/v3.7.x -- [#1271325](https://bugzilla.redhat.com/1271325): RFE: use code generation for repetitive stuff -- [#1271358](https://bugzilla.redhat.com/1271358): ECVOL: glustershd log grows quickly and fills up the root volume -- [#1271907](https://bugzilla.redhat.com/1271907): Improvement in install & package header files -- [#1272006](https://bugzilla.redhat.com/1272006): tools/glusterfind: add query command to list files without session -- [#1272207](https://bugzilla.redhat.com/1272207): Data Tiering:Filenames with spaces are not getting migrated at all -- [#1272319](https://bugzilla.redhat.com/1272319): Tier : Move common functions into tier.rc -- [#1272339](https://bugzilla.redhat.com/1272339): Creating a already deleted snapshot-clone deletes the corresponding snapshot. -- [#1272362](https://bugzilla.redhat.com/1272362): Fix in afr transaction code -- [#1272411](https://bugzilla.redhat.com/1272411): quota: set quota version for files/directories -- [#1272460](https://bugzilla.redhat.com/1272460): Disk usage mismatching after self-heal -- [#1272557](https://bugzilla.redhat.com/1272557): [Tier]: man page of gluster should be updated to list tier commands -- [#1272949](https://bugzilla.redhat.com/1272949): I/O failure on attaching tier on nfs client -- [#1272986](https://bugzilla.redhat.com/1272986): [sharding+geo-rep]: On existing slave mount, reading files fails to show sharded file content -- [#1273043](https://bugzilla.redhat.com/1273043): Data Tiering:Lot of Promotions/Demotions failed error messages -- [#1273215](https://bugzilla.redhat.com/1273215): Data Tiering:Promotions fail when brick of EC (disperse) cold layer are down -- [#1273315](https://bugzilla.redhat.com/1273315): fuse: Avoid redundant lookup on "." and ".." as part of every readdirp -- [#1273372](https://bugzilla.redhat.com/1273372): Data Tiering:getting failed to fsync on germany-hot-dht (Structure needs cleaning) warning -- [#1273387](https://bugzilla.redhat.com/1273387): FUSE clients in a container environment hang and do not recover post losing connections to all bricks -- [#1273726](https://bugzilla.redhat.com/1273726): Fully support data-tiering in 3.7.x, remove out of 'experimental' status -- [#1274626](https://bugzilla.redhat.com/1274626): Remove selinux mount option from "man mount.glusterfs" -- [#1274629](https://bugzilla.redhat.com/1274629): Data Tiering:error "[2015-10-14 18:15:09.270483] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-1: Failed to update version and size [Input/output error]" -- [#1274847](https://bugzilla.redhat.com/1274847): CTR should be enabled on attach tier, disabled otherwise. -- [#1275247](https://bugzilla.redhat.com/1275247): I/O hangs while self-heal is in progress on files -- [#1275383](https://bugzilla.redhat.com/1275383): Data Tiering:Getting lookup failed on files in hot tier, when volume is restarted -- [#1275489](https://bugzilla.redhat.com/1275489): Enhance the naming used for bugs for better name space -- [#1275502](https://bugzilla.redhat.com/1275502): [Tier]: Typo in the output while setting the wrong value of low/hi watermark -- [#1275524](https://bugzilla.redhat.com/1275524): Data Tiering:heat counters not getting reset and also internal ops seem to be heating the files -- [#1275616](https://bugzilla.redhat.com/1275616): snap-max-hard-limit for snapshots always shows as 256 in info file. -- [#1275966](https://bugzilla.redhat.com/1275966): RFE : Exporting multiple subdirectory entries for gluster volume using cli -- [#1276018](https://bugzilla.redhat.com/1276018): Wrong value of snap-max-hard-limit observed in 'gluster volume info'. -- [#1276023](https://bugzilla.redhat.com/1276023): Clone creation should not be successful when the node participating in volume goes down. -- [#1276028](https://bugzilla.redhat.com/1276028): [RFE] Geo-replication support for Volumes running in docker containers -- [#1276031](https://bugzilla.redhat.com/1276031): Assertion failure while moving files between directories on a dispersed volume -- [#1276141](https://bugzilla.redhat.com/1276141): Data Tiering: Tiering deamon is seeing each part of a file in a Disperse cold volume as a different file -- [#1276203](https://bugzilla.redhat.com/1276203): add-brick on a replicate volume could lead to data-loss -- [#1276243](https://bugzilla.redhat.com/1276243): gluster-nfs : Server crashed due to an invalid reference -- [#1276386](https://bugzilla.redhat.com/1276386): vol replace-brick fails when transport.socket.bind-address is set in glusterd -- [#1276423](https://bugzilla.redhat.com/1276423): glusterd: probing a new node(>=3.6) from 3.5 cluster is moving the peer to rejected state -- [#1276562](https://bugzilla.redhat.com/1276562): Data Tiering:tiering deamon crashes when trying to heat the file -- [#1276643](https://bugzilla.redhat.com/1276643): Upgrading a subset of cluster to 3.7.5 leads to issues with glusterd commands -- [#1276675](https://bugzilla.redhat.com/1276675): Arbiter volume becomes replica volume in some cases -- [#1276839](https://bugzilla.redhat.com/1276839): Geo-replication doesn't deal properly with sparse files -- [#1276989](https://bugzilla.redhat.com/1276989): ec-readdir.t is failing consistently -- [#1277024](https://bugzilla.redhat.com/1277024): BSD Smoke fails with _IOS_SAMP_DIR undeclared -- [#1277076](https://bugzilla.redhat.com/1277076): Monitor should restart the worker process when Changelog agent dies -- [#1277081](https://bugzilla.redhat.com/1277081): [New] - Message displayed after attach tier is misleading -- [#1277105](https://bugzilla.redhat.com/1277105): vol quota enable fails when transport.socket.bind-address is set in glusterd -- [#1277352](https://bugzilla.redhat.com/1277352): [Tier]: restarting volume reports "insert/update failure" in cold brick logs -- [#1277481](https://bugzilla.redhat.com/1277481): Upgrading to 3.7.-5-5 has changed volume to distributed disperse -- [#1277533](https://bugzilla.redhat.com/1277533): stop-all-gluster-processes.sh doesn't return correct return status -- [#1277716](https://bugzilla.redhat.com/1277716): fix lookup-unhashed for tiered volumes. -- [#1277997](https://bugzilla.redhat.com/1277997): vol heal info fails when transport.socket.bind-address is set in glusterd -- [#1278326](https://bugzilla.redhat.com/1278326): [New] - Files in a tiered volume gets promoted when bitd signs them -- [#1278418](https://bugzilla.redhat.com/1278418): Spurious failure in bug-1275616.t -- [#1278476](https://bugzilla.redhat.com/1278476): move mount-nfs-auth.t to failed tests lists -- [#1278689](https://bugzilla.redhat.com/1278689): quota/marker: quota accouting goes wrong with renaming file when IO in progress -- [#1278709](https://bugzilla.redhat.com/1278709): Tests/tiering: Correct typo in bug-1214222-directories_miising_after_attach_tier.t in bad_tests -- [#1278927](https://bugzilla.redhat.com/1278927): [New] - Message shown in gluster vol tier <volname> status output is incorrect. -- [#1279166](https://bugzilla.redhat.com/1279166): Data Tiering:Metadata changes to a file should not heat/promote the file -- [#1279297](https://bugzilla.redhat.com/1279297): Remove bug-1275616.t from bad tests list -- [#1279327](https://bugzilla.redhat.com/1279327): [Snapshot]: Clone creation fails on tiered volume with pre-validation failed message -- [#1279376](https://bugzilla.redhat.com/1279376): Data Tiering:Rename of cold file to a hot file causing split brain and showing two copies of files in mount point -- [#1279484](https://bugzilla.redhat.com/1279484): glusterfsd to support volfile-server-transport type "unix" -- [#1279637](https://bugzilla.redhat.com/1279637): Data Tiering:Regression:Detach tier commit is passing when detach tier is in progress -- [#1279705](https://bugzilla.redhat.com/1279705): AFR: 3-way-replication: Transport point not connected error messaged not displayed when one of the replica pair is down -- [#1279730](https://bugzilla.redhat.com/1279730): guest paused due to IO error from gluster based storage doesn't resume automatically or manually -- [#1279739](https://bugzilla.redhat.com/1279739): libgfapi to support set_volfile-server-transport type "unix" -- [#1279836](https://bugzilla.redhat.com/1279836): Fails to build twice in a row -- [#1279921](https://bugzilla.redhat.com/1279921): volume info of %s obtained from %s: ambiguous uuid - Starting geo-rep session -- [#1280428](https://bugzilla.redhat.com/1280428): fops-during-migration-pause.t spurious failure -- [#1281230](https://bugzilla.redhat.com/1281230): dht must avoid fresh lookups when a single replica pair goes offline -- [#1281265](https://bugzilla.redhat.com/1281265): DHT :- log is full of ' Found anomalies in /<DIR> (gfid = 00000000-0000-0000-0000-000000000000)' - for each Directory which was self healed -- [#1281598](https://bugzilla.redhat.com/1281598): Data Tiering: "ls" count taking link files and promote/demote files into consideration both on fuse and nfs mount -- [#1281892](https://bugzilla.redhat.com/1281892): packaging: gfind_missing_files are not in geo-rep %if ... %endif conditional -- [#1282076](https://bugzilla.redhat.com/1282076): cache mode must be the default mode for tiered volumes -- [#1282322](https://bugzilla.redhat.com/1282322): [GlusterD]: Volume start fails post add-brick on a volume which is not started -- [#1282331](https://bugzilla.redhat.com/1282331): Geo-replication is logging in Localtime -- [#1282390](https://bugzilla.redhat.com/1282390): Data Tiering:delete command rm -rf not deleting files the linkto file(hashed) which are under migration and possible spit-brain observed and possible disk wastage -- [#1282461](https://bugzilla.redhat.com/1282461): [upgrade] Error messages seen in glusterd logs, while upgrading from RHGS 2.1.6 to RHGS 3.1 -- [#1282673](https://bugzilla.redhat.com/1282673): ./tests/basic/tier/record-metadata-heat.t is failing upstream -- [#1282751](https://bugzilla.redhat.com/1282751): Large system file distribution is broken -- [#1282761](https://bugzilla.redhat.com/1282761): EC: File healing promotes it to hot tier -- [#1282915](https://bugzilla.redhat.com/1282915): glusterfs does not register with rpcbind on restart -- [#1283032](https://bugzilla.redhat.com/1283032): While file is self healing append to the file hangs -- [#1283103](https://bugzilla.redhat.com/1283103): Setting security.* xattrs fails -- [#1283178](https://bugzilla.redhat.com/1283178): [GlusterD]: Incorrect peer status showing if volume restart done before entire cluster update. -- [#1283211](https://bugzilla.redhat.com/1283211): check_host_list() should be more robust -- [#1283485](https://bugzilla.redhat.com/1283485): Warning messages seen in glusterd logs in executing gluster volume set help -- [#1283488](https://bugzilla.redhat.com/1283488): [Tier]: Space is missed b/w the words in the detach tier stop error message -- [#1283567](https://bugzilla.redhat.com/1283567): qupta/marker: backward compatibility with quota xattr vesrioning -- [#1283948](https://bugzilla.redhat.com/1283948): glupy default CFLAGS conflict with our CFLAGS when --enable-debug is used -- [#1283983](https://bugzilla.redhat.com/1283983): nfs-ganesha: Upcall sent on null gfid -- [#1284090](https://bugzilla.redhat.com/1284090): sometimes files are not getting demoted from hot tier to cold tier -- [#1284357](https://bugzilla.redhat.com/1284357): Data Tiering: Change the error message when a detach-tier status is issued on a non-tier volume -- [#1284365](https://bugzilla.redhat.com/1284365): Sharding - Extending writes filling incorrect final size in postbuf -- [#1284372](https://bugzilla.redhat.com/1284372): [Tier]: Stopping and Starting tier volume triggers fixing layout which fails on local host -- [#1284419](https://bugzilla.redhat.com/1284419): Resource leak in marker -- [#1284752](https://bugzilla.redhat.com/1284752): quota cli: enhance quota list command to list usage even if the limit is not set -- [#1284789](https://bugzilla.redhat.com/1284789): Snapshot creation after attach-tier causes glusterd crash -- [#1284823](https://bugzilla.redhat.com/1284823): fops-during-migration.t fails if hot and cold tiers are dist-rep -- [#1285046](https://bugzilla.redhat.com/1285046): AFR self-heal-daemon option is still set on volume though tier is detached -- [#1285152](https://bugzilla.redhat.com/1285152): store afr pending xattrs as a volume option -- [#1285173](https://bugzilla.redhat.com/1285173): Create doesn't remember flags it is opened with -- [#1285230](https://bugzilla.redhat.com/1285230): Data Tiering:File create terminates with "Input/output error" as split brain is observed -- [#1285241](https://bugzilla.redhat.com/1285241): Corrupted objects list does not get cleared even after all the files in the volume are deleted and count increases as old + new count -- [#1285288](https://bugzilla.redhat.com/1285288): Better indication of arbiter brick presence in a volume. -- [#1285483](https://bugzilla.redhat.com/1285483): legacy_many_files.t fails upstream -- [#1285488](https://bugzilla.redhat.com/1285488): [geo-rep]: Recommended Shared volume use on geo-replication is broken -- [#1285616](https://bugzilla.redhat.com/1285616): Brick crashes because of race in bit-rot init -- [#1285634](https://bugzilla.redhat.com/1285634): Self-heal triggered every couple of seconds and a 3-node 1-arbiter setup -- [#1285660](https://bugzilla.redhat.com/1285660): sharding - reads fail on sharded volume while running iozone -- [#1285663](https://bugzilla.redhat.com/1285663): tiering: Seeing error messages E "/usr/lib64/glusterfs/3.7.5/xlator/features/changetimerecorder.so(ctr_lookup+0x54f) [0x7f6c435c116f] ) 0-ctr: invalid argument: loc->name [Invalid argument] after attach tier -- [#1285968](https://bugzilla.redhat.com/1285968): cli/geo-rep : remove unused code -- [#1285989](https://bugzilla.redhat.com/1285989): bitrot: bitrot scrub status command should display the correct value of total number of scrubbed, unsigned file -- [#1286017](https://bugzilla.redhat.com/1286017): We need to skip data self-heal for arbiter bricks -- [#1286029](https://bugzilla.redhat.com/1286029): Data Tiering:File create terminates with "Input/output error" as split brain is observed -- [#1286279](https://bugzilla.redhat.com/1286279): tools/glusterfind: add --full option to query command -- [#1286346](https://bugzilla.redhat.com/1286346): Data Tiering:Don't allow or reset the frequency threshold values to zero when record counter features.record-counter is turned off -- [#1286656](https://bugzilla.redhat.com/1286656): Data Tiering:Read heat not getting calculated and read operations not heating the file with counter enabled -- [#1286735](https://bugzilla.redhat.com/1286735): RFE: add setup and teardown for fuse tests -- [#1286910](https://bugzilla.redhat.com/1286910): Tier: ec xattrs are set on a newly created file present in the non-ec hot tier -- [#1286959](https://bugzilla.redhat.com/1286959): [GlusterD]: After log rotate of cmd_history.log file, the next executed gluster commands are not present in the cmd_history.log file. -- [#1286974](https://bugzilla.redhat.com/1286974): Without detach tier commit, status changes back to tier migration -- [#1286988](https://bugzilla.redhat.com/1286988): bitrot: gluster man page and gluster cli usage does not mention the new scrub status cmd -- [#1287027](https://bugzilla.redhat.com/1287027): glusterd: cli is showing command success for rebalance commands(command which uses op_sm framework) even though staging is failed in follower node. -- [#1287455](https://bugzilla.redhat.com/1287455): glusterd: all the daemon's of existing volume stopping upon peer detach -- [#1287503](https://bugzilla.redhat.com/1287503): Full heal of volume fails on some nodes "Commit failed on X", and glustershd logs "Couldn't get xlator xl-0" -- [#1287517](https://bugzilla.redhat.com/1287517): Memory leak in glusterd -- [#1287519](https://bugzilla.redhat.com/1287519): [geo-rep+tiering]: symlinks are not getting synced to slave on tiered master setup -- [#1287539](https://bugzilla.redhat.com/1287539): xattrs on directories are unavailable on distributed replicated volume after adding new bricks -- [#1287723](https://bugzilla.redhat.com/1287723): Handle Rsync/Tar errors effectively -- [#1287763](https://bugzilla.redhat.com/1287763): glusterfs does not allow passing standard SElinux mount options to fuse -- [#1287842](https://bugzilla.redhat.com/1287842): Few snapshot creation fails with pre-validation failed message on tiered volume. -- [#1287872](https://bugzilla.redhat.com/1287872): add bug-924726.t to ignore list in regression -- [#1287992](https://bugzilla.redhat.com/1287992): [GlusterD]Probing a node having standalone volume, should not happen -- [#1287996](https://bugzilla.redhat.com/1287996): [Quota]: Peer status is in "Rejected" state with Quota enabled volume -- [#1288019](https://bugzilla.redhat.com/1288019): Possible memory leak in the tiered daemon -- [#1288059](https://bugzilla.redhat.com/1288059): glusterd: disable ping timer b/w glusterd and make epoll thread count default 1 -- [#1288474](https://bugzilla.redhat.com/1288474): tiering: quota list command is not working after attach or detach -- [#1288517](https://bugzilla.redhat.com/1288517): Data Tiering: new set of gluster v tier commands not working as expected -- [#1288857](https://bugzilla.redhat.com/1288857): Use after free bug in notify_kernel_loop in fuse-bridge code -- [#1288995](https://bugzilla.redhat.com/1288995): [tiering]: Tier daemon crashed on two of eight nodes and lot of "demotion failed" seen in the system -- [#1289068](https://bugzilla.redhat.com/1289068): libgfapi: Errno incorrectly set to EINVAL even on success -- [#1289258](https://bugzilla.redhat.com/1289258): core: use syscall wrappers instead of making direct syscalls; pread, pwrite -- [#1289428](https://bugzilla.redhat.com/1289428): Test ./tests/bugs/fuse/bug-924726.t fails -- [#1289447](https://bugzilla.redhat.com/1289447): Sharding - Iozone on sharded volume fails on NFS -- [#1289578](https://bugzilla.redhat.com/1289578): [Tier]: Failed to open "demotequeryfile-master-tier-dht" errors logged on the node having only cold bricks -- [#1289584](https://bugzilla.redhat.com/1289584): brick_up_status in tests/volume.rc is not correct -- [#1289602](https://bugzilla.redhat.com/1289602): After detach-tier start writes still go to hot tier -- [#1289840](https://bugzilla.redhat.com/1289840): Sharding: Remove dependency on performance.strict-write-ordering -- [#1289845](https://bugzilla.redhat.com/1289845): spurious failure of bug-1279376-rename-demoted-file.t -- [#1289859](https://bugzilla.redhat.com/1289859): Symlinks Rename fails in Symlink not exists in Slave -- [#1289869](https://bugzilla.redhat.com/1289869): Compile is broken in gluster master -- [#1289916](https://bugzilla.redhat.com/1289916): Client will not get notified about changes to volume if node used while mounting goes down -- [#1289935](https://bugzilla.redhat.com/1289935): Glusterfind hook script failing if /var/lib/glusterd/glusterfind dir was absent -- [#1290125](https://bugzilla.redhat.com/1290125): tests/basic/afr/arbiter-statfs.t fails most of the times on NetBSD -- [#1290151](https://bugzilla.redhat.com/1290151): hook script for CTDB should not change Samba config -- [#1290270](https://bugzilla.redhat.com/1290270): Several intermittent regression failures -- [#1290421](https://bugzilla.redhat.com/1290421): changelog: CHANGELOG rename error is logged on every changelog rollover -- [#1290604](https://bugzilla.redhat.com/1290604): S30Samba scripts do not work on systemd systems -- [#1290677](https://bugzilla.redhat.com/1290677): tiering: T files getting created , even after disk quota exceeds -- [#1290734](https://bugzilla.redhat.com/1290734): [GlusterD]: GlusterD log is filled with error messages - " Failed to aggregate response from node/brick" -- [#1290766](https://bugzilla.redhat.com/1290766): [RFE] quota: enhance quota enable and disable process -- [#1290865](https://bugzilla.redhat.com/1290865): nfs-ganesha server do not enter grace period during failover/failback -- [#1290965](https://bugzilla.redhat.com/1290965): [Tiering] + [DHT] - Detach tier fails to migrate the files when there are corrupted objects in hot tier. -- [#1290975](https://bugzilla.redhat.com/1290975): File is not demoted after self heal (split-brain) -- [#1291212](https://bugzilla.redhat.com/1291212): Regular files are listed as 'T' files on nfs mount -- [#1291259](https://bugzilla.redhat.com/1291259): Upcall/Cache-Invalidation: Use parent stbuf while updating parent entry -- [#1291537](https://bugzilla.redhat.com/1291537): [RFE] Provide mechanism to spin up reproducible test environment for all developers -- [#1291566](https://bugzilla.redhat.com/1291566): first file created after hot tier full fails to create, but later ends up as a stale erroneous file (file with ???????????) -- [#1291603](https://bugzilla.redhat.com/1291603): [tiering]: read/write freq-threshold allows negative values -- [#1291701](https://bugzilla.redhat.com/1291701): Renames/deletes failed with "No such file or directory" when few of the bricks from the hot tier went offline -- [#1292067](https://bugzilla.redhat.com/1292067): Data Tiering:Watermark:File continuously trying to demote itself but failing " [dht-rebalance.c:608:__dht_rebalance_create_dst_file] 0-wmrk-tier-dht: chown failed for //AP.BH.avi on wmrk-cold-dht (No such file or directory)" -- [#1292084](https://bugzilla.redhat.com/1292084): [georep+tiering]: Geo-replication sync is broken if cold tier is EC -- [#1292112](https://bugzilla.redhat.com/1292112): [Tier]: start tier daemon using rebal tier start doesnt start tierd if it is failed on any of single node -- [#1292379](https://bugzilla.redhat.com/1292379): md5sum of files mismatch after the self-heal is complete on the file -- [#1292463](https://bugzilla.redhat.com/1292463): [geo-rep]: ChangelogException: [Errno 22] Invalid argument observed upon rebooting the ACTIVE master node -- [#1292671](https://bugzilla.redhat.com/1292671): [tiering]: cluster.tier-max-files option in tiering is not honored -- [#1292749](https://bugzilla.redhat.com/1292749): Friend update floods can render the cluster incapable of handling other commands -- [#1292954](https://bugzilla.redhat.com/1292954): all: fix various errors/warnings reported by cppcheck -- [#1293034](https://bugzilla.redhat.com/1293034): Creation of files on hot tier volume taking very long time -- [#1293133](https://bugzilla.redhat.com/1293133): all: fix clang compile warnings -- [#1293223](https://bugzilla.redhat.com/1293223): Disperse: Disperse volume (cold vol) crashes while writing files on tier volume -- [#1293227](https://bugzilla.redhat.com/1293227): Minor improvements and code cleanup for locks translator -- [#1293256](https://bugzilla.redhat.com/1293256): [Tier]: "Bad file descriptor" on removal of symlink only on tiered volume -- [#1293293](https://bugzilla.redhat.com/1293293): afr: warn if pending xattrs missing during init() -- [#1293414](https://bugzilla.redhat.com/1293414): [GlusterD]: Peer detach happening with a node which is hosting volume bricks -- [#1293523](https://bugzilla.redhat.com/1293523): tier-snapshot.t runs too slowly on RHEL6 -- [#1293558](https://bugzilla.redhat.com/1293558): gluster cli crashed while performing 'gluster vol bitrot <vol_name> scrub status' -- [#1293601](https://bugzilla.redhat.com/1293601): quota: handle quota xattr removal when quota is enabled again -- [#1293932](https://bugzilla.redhat.com/1293932): [Tiering]: When files are heated continuously, promotions are too aggressive that it promotes files way beyond high water mark -- [#1293950](https://bugzilla.redhat.com/1293950): Gluster manpage doesn't show georeplication options -- [#1293963](https://bugzilla.redhat.com/1293963): [Tier]: can not delete symlinks from client using rm -- [#1294051](https://bugzilla.redhat.com/1294051): Though files are in split-brain able to perform writes to the file -- [#1294053](https://bugzilla.redhat.com/1294053): Excessive logging in mount when bricks of the replica are down -- [#1294209](https://bugzilla.redhat.com/1294209): glusterfs.spec.in: use %global per Fedora packaging guidelines -- [#1294223](https://bugzilla.redhat.com/1294223): uses deprecated find -perm +xxx syntax -- [#1294446](https://bugzilla.redhat.com/1294446): Ganesha hook script executes showmount and causes a hang -- [#1294448](https://bugzilla.redhat.com/1294448): [tiering]: Incorrect display of 'gluster v tier help' -- [#1294479](https://bugzilla.redhat.com/1294479): quota: limit xattr not healed for a sub-directory on a newly added bricks -- [#1294497](https://bugzilla.redhat.com/1294497): gluster volume status xml output of tiered volume has all the common services tagged under <coldBricks> -- [#1294588](https://bugzilla.redhat.com/1294588): Dist-geo-rep : geo-rep worker crashed while init with [Errno 34] Numerical result out of range. -- [#1294600](https://bugzilla.redhat.com/1294600): [Tier]: Killing glusterfs tier process doesn't reflect as failed/faulty in tier status -- [#1294637](https://bugzilla.redhat.com/1294637): [tiering]: Tiering isn't started after attaching hot tier and hence no promotion/demotion -- [#1294743](https://bugzilla.redhat.com/1294743): Lot of Inode not found messages in glfsheal log file -- [#1294786](https://bugzilla.redhat.com/1294786): Good files does not promoted in a tiered volume when bitrot is enabled -- [#1294794](https://bugzilla.redhat.com/1294794): "Transport endpoint not connected" in heal info though hot tier bricks are up -- [#1294809](https://bugzilla.redhat.com/1294809): mount options no longer valid: noexec, nosuid, noatime -- [#1294826](https://bugzilla.redhat.com/1294826): Speed up regression tests -- [#1295107](https://bugzilla.redhat.com/1295107): Fix mem leaks related to gfapi applications -- [#1295504](https://bugzilla.redhat.com/1295504): S29CTDBsetup hook script contains outdated documentation comments -- [#1295505](https://bugzilla.redhat.com/1295505): S29CTDB hook scripts contain comment references to downstream products and versions -- [#1295520](https://bugzilla.redhat.com/1295520): Manual mount command in S29CTDBsetup script lacks options (_netdev ...) -- [#1295702](https://bugzilla.redhat.com/1295702): Fix spurious failure in bug-1221481-allow-fops-on-dir-split-brain.t -- [#1295704](https://bugzilla.redhat.com/1295704): RFE: Provide a mechanism to disable some tests in regression -- [#1295763](https://bugzilla.redhat.com/1295763): Unable to modify quota hard limit on tier volume after disk limit got exceeded -- [#1295784](https://bugzilla.redhat.com/1295784): dht: misleading indentation, gcc-6 -- [#1296174](https://bugzilla.redhat.com/1296174): geo-rep: hard-link rename issue on changelog replay -- [#1296206](https://bugzilla.redhat.com/1296206): Geo-Replication Session goes "FAULTY" when application logs rolled on master -- [#1296399](https://bugzilla.redhat.com/1296399): Stale stat information for corrupted objects (replicated volume) -- [#1296496](https://bugzilla.redhat.com/1296496): [georep+disperse]: Geo-Rep session went to faulty with errors "[Errno 5] Input/output error" -- [#1296611](https://bugzilla.redhat.com/1296611): Rebalance crashed after detach tier. -- [#1296818](https://bugzilla.redhat.com/1296818): Move away from gf_log completely to gf_msg -- [#1296992](https://bugzilla.redhat.com/1296992): Stricter dependencies for glusterfs-server -- [#1297172](https://bugzilla.redhat.com/1297172): Client self-heals block the FOP that triggered the heals -- [#1297195](https://bugzilla.redhat.com/1297195): no-mtab (-n) mount option ignore next mount option -- [#1297311](https://bugzilla.redhat.com/1297311): Attach tier : Creates fail with invalid argument errors -- [#1297373](https://bugzilla.redhat.com/1297373): [write-behind] : Write/Append to a full volume causes fuse client to crash -- [#1297638](https://bugzilla.redhat.com/1297638): gluster vol get volname user.metadata-text" Command fails with "volume get option: failed: Did you mean cluster.metadata-self-heal?" -- [#1297695](https://bugzilla.redhat.com/1297695): heal info reporting slow when IO is in progress on the volume -- [#1297740](https://bugzilla.redhat.com/1297740): tests/bugs/quota/bug-1049323.t fails in fedora -- [#1297750](https://bugzilla.redhat.com/1297750): volume info xml does not show arbiter details -- [#1297897](https://bugzilla.redhat.com/1297897): RFE: "heal" commands output should have a fixed fields -- [#1298111](https://bugzilla.redhat.com/1298111): Fix sparse-file-self-heal.t and remove from bad tests -- [#1298439](https://bugzilla.redhat.com/1298439): GlusterD restart, starting the bricks when server quorum not met -- [#1298498](https://bugzilla.redhat.com/1298498): glusterfs crash during load testing -- [#1298520](https://bugzilla.redhat.com/1298520): tests : Modifying tests for crypt xlator -- [#1299410](https://bugzilla.redhat.com/1299410): [Fuse: ] crash while --attribute-timeout and -entry-timeout are set to 0 -- [#1299497](https://bugzilla.redhat.com/1299497): Quota Aux mount crashed -- [#1299710](https://bugzilla.redhat.com/1299710): Glusterd: Creation of volume is failing if one of the brick is down on the server -- [#1299819](https://bugzilla.redhat.com/1299819): Snapshot creation fails on a tiered volume -- [#1300152](https://bugzilla.redhat.com/1300152): Rebalance process crashed during cleanup_and_exit -- [#1300253](https://bugzilla.redhat.com/1300253): Test open-behind.t failing fairly often on NetBSD -- [#1300412](https://bugzilla.redhat.com/1300412): Data Tiering:Change the default tiering values to optimize tiering settings -- [#1300564](https://bugzilla.redhat.com/1300564): I/O failure during a graph change followed by an option change. -- [#1300596](https://bugzilla.redhat.com/1300596): 'gluster volume get' returns 0 value for server-quorum-ratio -- [#1300929](https://bugzilla.redhat.com/1300929): Lot of assertion failures are seen in nfs logs with disperse volume -- [#1300956](https://bugzilla.redhat.com/1300956): [RFE] Schedule Geo-replication -- [#1300979](https://bugzilla.redhat.com/1300979): [Snapshot]: Snapshot restore stucks in post validation. -- [#1301032](https://bugzilla.redhat.com/1301032): [georep+tiering]: Hardlink sync is broken if master volume is tiered -- [#1301227](https://bugzilla.redhat.com/1301227): Tiering should break out of iterating query file once cycle time completes. -- [#1301352](https://bugzilla.redhat.com/1301352): Point users of glusterfs-hadoop to the upstream project -- [#1301473](https://bugzilla.redhat.com/1301473): [Tiering]: Values of watermarks, min free disk etc will be miscalculated with quota set on root directory of gluster volume -- [#1302200](https://bugzilla.redhat.com/1302200): Unable to get the client statedump, as /var/run/gluster directory is not available by default -- [#1302201](https://bugzilla.redhat.com/1302201): Scrubber crash (list corruption) -- [#1302205](https://bugzilla.redhat.com/1302205): Improve error message for unsupported clients -- [#1302234](https://bugzilla.redhat.com/1302234): [SNAPSHOT]: Decrease the VHD_SIZE in snapshot.rc -- [#1302257](https://bugzilla.redhat.com/1302257): [tiering]: Quota object limits not adhered to, in a tiered volume -- [#1302291](https://bugzilla.redhat.com/1302291): Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful" -- [#1302307](https://bugzilla.redhat.com/1302307): Vim commands from a non-root user fails to execute on fuse mount with trash feature enabled -- [#1302554](https://bugzilla.redhat.com/1302554): Able to create files when quota limit is set to 0 -- [#1302772](https://bugzilla.redhat.com/1302772): promotions not balanced across hot tier sub-volumes -- [#1302948](https://bugzilla.redhat.com/1302948): tar complains: <fileName>: file changed as we read it -- [#1303028](https://bugzilla.redhat.com/1303028): Tiering status and rebalance status stops getting updated -- [#1303269](https://bugzilla.redhat.com/1303269): After GlusterD restart, Remove-brick commit happening even though data migration not completed. -- [#1303501](https://bugzilla.redhat.com/1303501): access-control : spurious error log message on every setxattr call -- [#1303828](https://bugzilla.redhat.com/1303828): [USS]: If .snaps already exists, ls -la lists it even after enabling USS -- [#1303829](https://bugzilla.redhat.com/1303829): [feat] Compound translator -- [#1303895](https://bugzilla.redhat.com/1303895): promotions not happening when space is created on previously full hot tier -- [#1303945](https://bugzilla.redhat.com/1303945): Memory leak in dht -- [#1303995](https://bugzilla.redhat.com/1303995): SMB: SMB crashes with AIO enabled on reads + vers=3.0 -- [#1304301](https://bugzilla.redhat.com/1304301): self-heald.t spurious failure -- [#1304348](https://bugzilla.redhat.com/1304348): Allow GlusterFS to build with URCU 0.6 -- [#1304686](https://bugzilla.redhat.com/1304686): Start self-heal and display correct heal info after replace brick -- [#1304966](https://bugzilla.redhat.com/1304966): DHT: Take blocking locks while renaming files -- [#1304970](https://bugzilla.redhat.com/1304970): [quota]: Incorrect disk usage shown on a tiered volume -- [#1304988](https://bugzilla.redhat.com/1304988): DHT: Rebalance hang while migrating the files of disperse volume -- [#1305277](https://bugzilla.redhat.com/1305277): [Tier]: Endup in multiple entries of same file on client after rename which had a hardlinks -- [#1305839](https://bugzilla.redhat.com/1305839): Wrong interpretation of disk size in gverify.sh script -- [#1306193](https://bugzilla.redhat.com/1306193): cd to .snaps fails with "transport endpoint not connected" after force start of the volume. -- [#1306199](https://bugzilla.redhat.com/1306199): gluster volume heal info takes extra 2 seconds -- [#1306220](https://bugzilla.redhat.com/1306220): quota: xattr trusted.glusterfs.quota.limit-objects not healed on a root of newly added brick -- [#1306264](https://bugzilla.redhat.com/1306264): glfs_lseek returns incorrect offset for SEEK_SET and SEEK_CUR flags -- [#1306560](https://bugzilla.redhat.com/1306560): Accessing program list in build_prog_details () should be lock protected -- [#1306807](https://bugzilla.redhat.com/1306807): use mutex on single core machines -- [#1306852](https://bugzilla.redhat.com/1306852): Tiering threads can starve each other -- [#1306897](https://bugzilla.redhat.com/1306897): Remove split-brain-healing.t from bad tests -- [#1307208](https://bugzilla.redhat.com/1307208): dht: NULL layouts referenced while the I/O is going on tiered volume -- [#1308402](https://bugzilla.redhat.com/1308402): Newly created volume start, starting the bricks when server quorum not met -- [#1308900](https://bugzilla.redhat.com/1308900): build: fix build break -- [#1308961](https://bugzilla.redhat.com/1308961): [New] - quarantine folder becomes empty and bitrot status does not list any files which are corrupted -- [#1309238](https://bugzilla.redhat.com/1309238): Issues with refresh-config when the ".export_added" has different values on different nodes -- [#1309342](https://bugzilla.redhat.com/1309342): Wrong permissions set on previous copy of truncated files inside trash directory -- [#1309462](https://bugzilla.redhat.com/1309462): Upgrade from 3.7.6 to 3.7.8 causes massive drop in write performance. Fresh install of 3.7.8 also has low write performance -- [#1309659](https://bugzilla.redhat.com/1309659): [tiering]: Performing a gluster vol reset, turns off 'features.ctr-enabled' on a tiered volume -- [#1309999](https://bugzilla.redhat.com/1309999): Data Tiering:Don't allow a detach-tier commit if detach-tier start has failed to complete -- [#1310080](https://bugzilla.redhat.com/1310080): [RFE]Add --no-encode option to the `glusterfind pre` command -- [#1310171](https://bugzilla.redhat.com/1310171): Incorrect file size on mount if stat is served from the arbiter brick. -- [#1310437](https://bugzilla.redhat.com/1310437): rsyslog can't be completely removed due to dependency in libglusterfs -- [#1310620](https://bugzilla.redhat.com/1310620): gfapi : listxattr is broken for handle ops. -- [#1310677](https://bugzilla.redhat.com/1310677): glusterd crashed when probing a node with firewall enabled on only one node -- [#1310755](https://bugzilla.redhat.com/1310755): glusterd: coverity warning in glusterd-snapshot-utils.c copy_nfs_ganesha_file() -- [#1311124](https://bugzilla.redhat.com/1311124): Implement inode_forget_cbk() similar fops in gfapi -- [#1311146](https://bugzilla.redhat.com/1311146): glfs_dup() functionality is broken -- [#1311178](https://bugzilla.redhat.com/1311178): Tier: Actual files are not demoted and keep on trying to demoted deleted files -- [#1311874](https://bugzilla.redhat.com/1311874): Peer probe from a reinstalled node should fail -- [#1312036](https://bugzilla.redhat.com/1312036): tests: upstream test infra brocken -- [#1312226](https://bugzilla.redhat.com/1312226): Readdirp op_ret is modified by client xlator in case of xdata_rsp presence -- [#1312346](https://bugzilla.redhat.com/1312346): nfs: fix lock variable type -- [#1312354](https://bugzilla.redhat.com/1312354): changelog: fix typecasting of function -- [#1312816](https://bugzilla.redhat.com/1312816): gfid-reset of a directory in distributed replicate volume doesn't set gfid on 2nd till last subvolumes -- [#1312845](https://bugzilla.redhat.com/1312845): Protocol server/client handshake gap -- [#1312897](https://bugzilla.redhat.com/1312897): glusterfs-server %post script is not quiet, prints "success" during installation -- [#1313135](https://bugzilla.redhat.com/1313135): RFE: Need type of gfid in index_readdir -- [#1313206](https://bugzilla.redhat.com/1313206): Encrypted rpc clients do not reconnect sometimes -- [#1313228](https://bugzilla.redhat.com/1313228): promotions and demotions not happening after attach tier due to fix layout taking very long time(3 days) -- [#1313293](https://bugzilla.redhat.com/1313293): [HC] glusterfs mount crashed -- [#1313300](https://bugzilla.redhat.com/1313300): quota: reduce latency for testcase ./tests/bugs/quota/bug-1293601.t -- [#1313303](https://bugzilla.redhat.com/1313303): [geo-rep]: Session goes to faulty with Errno 13: Permission denied -- [#1313495](https://bugzilla.redhat.com/1313495): migrate files based on file size -- [#1313628](https://bugzilla.redhat.com/1313628): Brick ports get changed after GlusterD restart -- [#1313775](https://bugzilla.redhat.com/1313775): ec-read-policy.t can report a false-failure -- [#1313901](https://bugzilla.redhat.com/1313901): glusterd: does not start -- [#1314150](https://bugzilla.redhat.com/1314150): Choose self-heal source as local subvolume if possible -- [#1314204](https://bugzilla.redhat.com/1314204): nfs-ganesha setup fails on fedora -- [#1314291](https://bugzilla.redhat.com/1314291): tier: GCC throws Unused variable warning for conf in tier_link_cbk function -- [#1314549](https://bugzilla.redhat.com/1314549): remove replace-brick-self-heal.t from bad tests -- [#1314649](https://bugzilla.redhat.com/1314649): disperse: Provide an option to enable/disable eager lock -- [#1315024](https://bugzilla.redhat.com/1315024): glusterfs-libs postun scriptlet fail /sbin/ldconfig: relative path '1' used to build cache -- [#1315168](https://bugzilla.redhat.com/1315168): Fd based fops should not be logging ENOENT/ESTALE -- [#1315186](https://bugzilla.redhat.com/1315186): setting lower op-version should throw failure message -- [#1315465](https://bugzilla.redhat.com/1315465): glusterfs brick process crashed -- [#1315560](https://bugzilla.redhat.com/1315560): ./tests/basic/tier/tier-file-create.t dumping core fairly often on build machines in Linux -- [#1315601](https://bugzilla.redhat.com/1315601): Geo-replication CPU usage is 100% -- [#1315659](https://bugzilla.redhat.com/1315659): [Tier]: Following volume restart, tierd shows failure at status on some nodes -- [#1315666](https://bugzilla.redhat.com/1315666): Data Tiering:tier volume status shows as in-progress on all nodes of a cluster even if the node is not part of volume -- [#1316327](https://bugzilla.redhat.com/1316327): Upgrade from 3.7.6 to 3.7.8 causes massive drop in write performance. Fresh install of 3.7.8 also has low write performance -- [#1316437](https://bugzilla.redhat.com/1316437): snapd doesn't come up automatically after node reboot. -- [#1316462](https://bugzilla.redhat.com/1316462): Fix possible failure in tests/basic/afr/arbiter.t -- [#1316499](https://bugzilla.redhat.com/1316499): volume set on user.* domain trims all white spaces in the value -- [#1316819](https://bugzilla.redhat.com/1316819): Errors seen in cli.log, while executing the command 'gluster snapshot info --xml' -- [#1316848](https://bugzilla.redhat.com/1316848): Peers goes to rejected state after reboot of one node when quota is enabled on cloned volume. -- [#1317278](https://bugzilla.redhat.com/1317278): GlusterFS 3.8.0 tracker -- [#1317361](https://bugzilla.redhat.com/1317361): Do not succeed mkdir without gfid-req -- [#1317424](https://bugzilla.redhat.com/1317424): nfs-ganesha server do not enter grace period during failover/failback -- [#1317785](https://bugzilla.redhat.com/1317785): Cache swift xattrs -- [#1317902](https://bugzilla.redhat.com/1317902): Different epoch values for each of NFS-Ganesha heads -- [#1317948](https://bugzilla.redhat.com/1317948): inode ref leaks with perf-test.sh -- [#1318107](https://bugzilla.redhat.com/1318107): Typo in log message for posix_mkdir log -- [#1318158](https://bugzilla.redhat.com/1318158): Client's App is having issues retrieving files from share 1002976973 -- [#1318544](https://bugzilla.redhat.com/1318544): Glusterd crashed during volume status of snapd daemon -- [#1318546](https://bugzilla.redhat.com/1318546): Glusterd crashed just after a peer probe command failed. -- [#1318751](https://bugzilla.redhat.com/1318751): cluster/afr: Fix partial heals in 3-way replication -- [#1318757](https://bugzilla.redhat.com/1318757): trash xlator : trash_unlink_mkdir_cbk() enters in an infinte loop which results segfault -- [#1319374](https://bugzilla.redhat.com/1319374): smbd crashes while accessing multiple volume shares via same client -- [#1319581](https://bugzilla.redhat.com/1319581): Marker: Lot of dict_get errors in brick log!! -- [#1319706](https://bugzilla.redhat.com/1319706): Add a script that converts the gfid-string of a directory into absolute path name w.r.t the brick path. -- [#1319717](https://bugzilla.redhat.com/1319717): glusterfind pre test projects_media2 /tmp/123 rh-storage2 - pre failed: Traceback ... -- [#1319992](https://bugzilla.redhat.com/1319992): RFE: Lease support for gluster -- [#1320101](https://bugzilla.redhat.com/1320101): Client log gets flooded by default with fop stats under DEBUG level -- [#1320388](https://bugzilla.redhat.com/1320388): [GSS]-gluster v heal volname info does not work with enabled ssl/tls -- [#1320458](https://bugzilla.redhat.com/1320458): Peer information is not propagated to all the nodes in the cluster, when the peer is probed with its second interface FQDN/IP -- [#1320489](https://bugzilla.redhat.com/1320489): glfs-mgmt: fix connecting to multiple volfile transports -- [#1320716](https://bugzilla.redhat.com/1320716): RFE Sort volume quota <volume> list output alphabetically by path -- [#1320818](https://bugzilla.redhat.com/1320818): Over some time Files which were accessible become inaccessible(music files) -- [#1321322](https://bugzilla.redhat.com/1321322): afr: add mtime based split-brain resolution to CLI -- [#1321554](https://bugzilla.redhat.com/1321554): assert failure happens when parallel rm -rf is issued on nfs mounts -- [#1321762](https://bugzilla.redhat.com/1321762): glusterd: response not aligned -- [#1321872](https://bugzilla.redhat.com/1321872): el6 - Installing glusterfs-ganesha-3.7.9-1.el6rhs.x86_64 fails with dependency on /usr/bin/dbus-send -- [#1321955](https://bugzilla.redhat.com/1321955): Self-heal and manual heal not healing some file -- [#1322214](https://bugzilla.redhat.com/1322214): [HC] Add disk in a Hyper-converged environment fails when glusterfs is running in directIO mode -- [#1322237](https://bugzilla.redhat.com/1322237): glusterd pmap scan wastes time scanning for not relevant ports -- [#1322253](https://bugzilla.redhat.com/1322253): gluster volume heal info shows conservative merge entries as in split-brain -- [#1322262](https://bugzilla.redhat.com/1322262): Glusterd crashes when a message is passed through rpc which is not available -- [#1322320](https://bugzilla.redhat.com/1322320): build: git ignore files generated by fdl xlator -- [#1322323](https://bugzilla.redhat.com/1322323): fdl: fix make clean -- [#1322489](https://bugzilla.redhat.com/1322489): marker: account goes bad with rm -rf -- [#1322772](https://bugzilla.redhat.com/1322772): glusterd: glusted didn't come up after node reboot error" realpath () failed for brick /run/gluster/snaps/130949baac8843cda443cf8a6441157f/brick3/b3. The underlying file system may be in bad state [No such file or directory]" -- [#1322801](https://bugzilla.redhat.com/1322801): nfs-ganesha installation : no pacemaker package installed for RHEL 6.7 -- [#1322805](https://bugzilla.redhat.com/1322805): [scale] Brick process does not start after node reboot -- [#1322825](https://bugzilla.redhat.com/1322825): IO-stats, client profile is overwritten when it is on the same node as bricks -- [#1322850](https://bugzilla.redhat.com/1322850): Healing queue rarely empty -- [#1323040](https://bugzilla.redhat.com/1323040): Inconsistent directory structure on dht subvols caused by parent layouts going stale during entry create operations because of fix-layout -- [#1323287](https://bugzilla.redhat.com/1323287): TIER : Attach tier fails -- [#1323360](https://bugzilla.redhat.com/1323360): quota/cli: quota list with path not working when limit is not set -- [#1323486](https://bugzilla.redhat.com/1323486): quota: check inode limits only when new file/dir is created and not with write FOP -- [#1323659](https://bugzilla.redhat.com/1323659): rpc: assign port only if it is unreserved -- [#1324004](https://bugzilla.redhat.com/1324004): arbiter volume write performance is bad. -- [#1324439](https://bugzilla.redhat.com/1324439): SAMBA+TIER : Wrong message display.On detach tier success the message reflects Tier command failed. -- [#1324509](https://bugzilla.redhat.com/1324509): Continuous nfs_grace_monitor log messages observed in /var/log/messages -- [#1325683](https://bugzilla.redhat.com/1325683): the wrong variable was being checked for gf_strdup -- [#1325822](https://bugzilla.redhat.com/1325822): Too many log messages showing inode ctx is NULL for 00000000-0000-0000-0000-000000000000 -- [#1325841](https://bugzilla.redhat.com/1325841): Volume stop is failing when one of brick is down due to underlying filesystem crash -- [#1326085](https://bugzilla.redhat.com/1326085): [rfe]posix-locks: Lock migration -- [#1326308](https://bugzilla.redhat.com/1326308): WORM/Retention Feature -- [#1326410](https://bugzilla.redhat.com/1326410): /var/lib/glusterd/$few-directories not owned by any package, causing it to remain after glusterfs-server is uninstalled -- [#1326627](https://bugzilla.redhat.com/1326627): nfs-ganesha crashes with segfault error while doing refresh config on volume. -- [#1327174](https://bugzilla.redhat.com/1327174): op-version for 3.8 features should be set to GD_OP_VERSION_3_8_0 -- [#1327507](https://bugzilla.redhat.com/1327507): [DHT-Rebalance]: with few brick process down, rebalance process isn't killed even after stopping rebalance process -- [#1327553](https://bugzilla.redhat.com/1327553): [geo-rep]: geo status shows $MASTER Nodes always with hostname even if volume is configured with IP -- [#1327976](https://bugzilla.redhat.com/1327976): [RFE] Provide vagrant developer setup for GlusterFS -- [#1328010](https://bugzilla.redhat.com/1328010): snapshot-clone: clone volume doesn't start after node reboot -- [#1328043](https://bugzilla.redhat.com/1328043): [FEAT] Renaming NSR to JBR -- [#1328399](https://bugzilla.redhat.com/1328399): [geo-rep]: schedule_georep.py doesn't touch the mount in every iteration -- [#1328502](https://bugzilla.redhat.com/1328502): Move FOP enumerations and other network protocol bits to XDR generated headers -- [#1328696](https://bugzilla.redhat.com/1328696): quota : fix null dereference issue -- [#1329129](https://bugzilla.redhat.com/1329129): runner: extract and return actual exit status of child -- [#1329501](https://bugzilla.redhat.com/1329501): self-heal does fsyncs even after setting ensure-durability off -- [#1329503](https://bugzilla.redhat.com/1329503): [tiering]: during detach tier operation, Input/output error is seen with new file writes on NFS mount -- [#1329773](https://bugzilla.redhat.com/1329773): Inode leaks found in data-self-heal -- [#1329870](https://bugzilla.redhat.com/1329870): Lots of [global.glusterfs - usage-type (null) memusage] are seen in statedump -- [#1330052](https://bugzilla.redhat.com/1330052): [RFE] We need more debug info from stack wind and unwind calls -- [#1330225](https://bugzilla.redhat.com/1330225): gluster is not using pthread_equal to compare thread -- [#1330248](https://bugzilla.redhat.com/1330248): glusterd: SSL certificate depth volume option is incorrect -- [#1330346](https://bugzilla.redhat.com/1330346): distaflibs: structure directory tree to follow setuptools namespace packages format -- [#1330353](https://bugzilla.redhat.com/1330353): [Tiering]: promotion of files may not be balanced on distributed hot tier when promoting files with size as that of max.mb -- [#1330476](https://bugzilla.redhat.com/1330476): libgfapi:Setting need_lookup on wrong list -- [#1330481](https://bugzilla.redhat.com/1330481): glusterd restart is failing if volume brick is down due to underlying FS crash. -- [#1330567](https://bugzilla.redhat.com/1330567): SAMBA+TIER : File size is not getting updated when created on windows samba share mount -- [#1330583](https://bugzilla.redhat.com/1330583): glusterfs-libs postun ldconfig: relative path `1' used to build cache -- [#1330616](https://bugzilla.redhat.com/1330616): Minor improvements and code cleanup for libglusterfs -- [#1330974](https://bugzilla.redhat.com/1330974): Swap order of GF_EVENT_SOME_CHILD_DOWN enum to match the release3.-7 branch -- [#1331042](https://bugzilla.redhat.com/1331042): glusterfsd: return actual exit status on mount process -- [#1331253](https://bugzilla.redhat.com/1331253): glusterd: fix max pmap alloc to GF_PORT_MAX -- [#1331289](https://bugzilla.redhat.com/1331289): glusterd memory overcommit -- [#1331658](https://bugzilla.redhat.com/1331658): [geo-rep]: schedule_georep.py doesn't work when invoked using cron -- [#1332020](https://bugzilla.redhat.com/1332020): multiple regression failures for tests/basic/quota-ancestry-building.t -- [#1332021](https://bugzilla.redhat.com/1332021): multiple failures for testcase: tests/basic/inode-quota-enforcing.t -- [#1332022](https://bugzilla.redhat.com/1332022): multiple failures for testcase: tests/bugs/disperse/bug-1304988.t -- [#1332045](https://bugzilla.redhat.com/1332045): multiple failures for testcase: tests/basic/quota.t -- [#1332162](https://bugzilla.redhat.com/1332162): Support mandatory locking in glusterfs -- [#1332370](https://bugzilla.redhat.com/1332370): DHT: Once remove brick start failed in between Remove brick commit should not be allowed -- [#1332396](https://bugzilla.redhat.com/1332396): posix: Set correct d_type for readdirp() calls -- [#1332414](https://bugzilla.redhat.com/1332414): protocol/server: address double free's -- [#1332788](https://bugzilla.redhat.com/1332788): Wrong op-version for mandatory-locks volume set option -- [#1332789](https://bugzilla.redhat.com/1332789): quota: client gets IO error instead of disk quota exceed when the limit is exceeded -- [#1332839](https://bugzilla.redhat.com/1332839): values for Number of Scrubbed files, Number of Unsigned files, Last completed scrub time and Duration of last scrub are shown as zeros in bit rot scrub status -- [#1332845](https://bugzilla.redhat.com/1332845): Disperse volume fails on high load and logs show some assertion failures -- [#1332864](https://bugzilla.redhat.com/1332864): glusterd + bitrot : Creating clone of snapshot. error "xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/features/bitrot.so: cannot open shared object file: -- [#1333243](https://bugzilla.redhat.com/1333243): [AFR]: "volume heal info" command is failing during in-service upgrade to latest. -- [#1333244](https://bugzilla.redhat.com/1333244): Fix excessive logging due to NULL dict in dht -- [#1333266](https://bugzilla.redhat.com/1333266): SMB:while running I/O on cifs mount and doing graph switch causes cifs mount to hang. -- [#1333711](https://bugzilla.redhat.com/1333711): [scale] Brick process does not start after node reboot -- [#1333803](https://bugzilla.redhat.com/1333803): Detach tier fire before the background fixlayout is complete may result in failure -- [#1333900](https://bugzilla.redhat.com/1333900): /var/lib/glusterd/$few-directories not owned by any package, causing it to remain after glusterfs-server is uninstalled -- [#1334074](https://bugzilla.redhat.com/1334074): No xml output on gluster volume heal info command with --xml -- [#1334268](https://bugzilla.redhat.com/1334268): GlusterFS 3.8 fails to build in the CentOS Community Build System -- [#1334287](https://bugzilla.redhat.com/1334287): Under high read load, sometimes the message "XDR decoding failed" appears in the logs and read fails -- [#1334443](https://bugzilla.redhat.com/1334443): SAMBA-VSS : Permission denied issue while restoring the directory from windows client 1 when files are deleted from windows client 2 -- [#1334699](https://bugzilla.redhat.com/1334699): readdir-ahead does not fetch xattrs that md-cache needs in it's internal calls -- [#1334836](https://bugzilla.redhat.com/1334836): [features/worm] - when disabled, worm xl should simply pass requested fops to its child xl -- [#1334994](https://bugzilla.redhat.com/1334994): Fix the message ids in Client -- [#1335017](https://bugzilla.redhat.com/1335017): set errno in case of inode_link failures -- [#1335282](https://bugzilla.redhat.com/1335282): Wrong constant used in length based comparison for XATTR_SECURITY_PREFIX -- [#1335283](https://bugzilla.redhat.com/1335283): Self Heal fails on a replica3 volume with 'disk quota exceeded' -- [#1335284](https://bugzilla.redhat.com/1335284): [HC] Add disk in a Hyper-converged environment fails when glusterfs is running in directIO mode -- [#1335285](https://bugzilla.redhat.com/1335285): tar complains: <fileName>: file changed as we read it -- [#1335433](https://bugzilla.redhat.com/1335433): Self heal shows different information for the same volume from each node -- [#1335726](https://bugzilla.redhat.com/1335726): stop all gluster processes should also also include glusterfs mount process -- [#1335730](https://bugzilla.redhat.com/1335730): mount/fuse: Logging improvements -- [#1335822](https://bugzilla.redhat.com/1335822): Revert "features/shard: Make o-direct writes work with sharding: http://review.gluster.org/#/c/13846/" -- [#1335829](https://bugzilla.redhat.com/1335829): Heal info shows split-brain for .shard directory though only one brick was down -- [#1336136](https://bugzilla.redhat.com/1336136): PREFIX is not honoured during build and install -- [#1336152](https://bugzilla.redhat.com/1336152): [Tiering]: Files remain in hot tier even after detach tier completes -- [#1336198](https://bugzilla.redhat.com/1336198): failover is not working with latest builds. -- [#1336268](https://bugzilla.redhat.com/1336268): features/index: clang compile warnings in index.c -- [#1336285](https://bugzilla.redhat.com/1336285): Worker dies with [Errno 5] Input/output error upon creation of entries at slave -- [#1336472](https://bugzilla.redhat.com/1336472): [Tiering]: The message 'Max cycle time reached..exiting migration' incorrectly displayed as an 'error' in the logs -- [#1336704](https://bugzilla.redhat.com/1336704): [geo-rep]: Multiple geo-rep session to the same slave is allowed for different users -- [#1336794](https://bugzilla.redhat.com/1336794): assorted typos and spelling mistakes from Debian lintian -- [#1336798](https://bugzilla.redhat.com/1336798): Unexporting a volume sometimes fails with "Dynamic export addition/deletion failed". -- [#1336801](https://bugzilla.redhat.com/1336801): ganesha exported volumes doesn't get synced up on shutdown node when it comes up. -- [#1336854](https://bugzilla.redhat.com/1336854): scripts: bash-isms in scripts -- [#1336947](https://bugzilla.redhat.com/1336947): [NFS-Ganesha] : stonith-enabled option not set with new versions of cman,pacemaker,corosync and pcs -- [#1337114](https://bugzilla.redhat.com/1337114): Modified volume options are not syncing once glusterd comes up. -- [#1337127](https://bugzilla.redhat.com/1337127): rpc: change client insecure port ceiling from 65535 to 49151 -- [#1337130](https://bugzilla.redhat.com/1337130): Revert "glusterd/afr: store afr pending xattrs as a volume option" patch on 3.8 branch -- [#1337387](https://bugzilla.redhat.com/1337387): Add arbiter brick hotplug -- [#1337394](https://bugzilla.redhat.com/1337394): DHT : few Files are not accessible and not listed on mount + more than one Directory have same gfid + (sometimes) attributes has ?? in ls output after renaming Directories from multiple client at same time -- [#1337596](https://bugzilla.redhat.com/1337596): Mounting a volume over NFS with a subdir followed by a / returns "Invalid argument" -- [#1337638](https://bugzilla.redhat.com/1337638): Leases: Fix lease failures in certain scenarios -- [#1337652](https://bugzilla.redhat.com/1337652): log flooded with Could not map name=xxxx to a UUID when config'd with long hostnames -- [#1337780](https://bugzilla.redhat.com/1337780): tests/bugs/write-behind/1279730.t fails spuriously -- [#1337795](https://bugzilla.redhat.com/1337795): tests/basic/afr/tarissue.t fails regression -- [#1337805](https://bugzilla.redhat.com/1337805): Mandatory locks are not migrated during lock migration -- [#1337822](https://bugzilla.redhat.com/1337822): one of vm goes to paused state when network goes down and comes up back -- [#1337839](https://bugzilla.redhat.com/1337839): Files present in the .shard folder even after deleting all the vms from the UI -- [#1337870](https://bugzilla.redhat.com/1337870): Some of VMs go to paused state when there is concurrent I/O on vms -- [#1337908](https://bugzilla.redhat.com/1337908): SAMBA+TIER : Wrong message display.On detach tier success the message reflects Tier command failed. -- [#1338051](https://bugzilla.redhat.com/1338051): ENOTCONN error during parallel rmdir -- [#1338501](https://bugzilla.redhat.com/1338501): implement meta-lock/unlock for lock migration -- [#1338669](https://bugzilla.redhat.com/1338669): AFR : fuse,nfs mount hangs when directories with same names are created and deleted continuously -- [#1338968](https://bugzilla.redhat.com/1338968): common-ha: ganesha.nfsd not put into NFS-GRACE after fail-back -- [#1339137](https://bugzilla.redhat.com/1339137): fuse: In fuse_first_lookup(), dict is not un-referenced in case create_frame returns an empty pointer. -- [#1339192](https://bugzilla.redhat.com/1339192): Missing autotools helper config.* files -- [#1339228](https://bugzilla.redhat.com/1339228): gfapi: set mem_acct for the variables created for upcall -- [#1339436](https://bugzilla.redhat.com/1339436): Full heal of a sub-directory does not clean up name-indices when granular-entry-heal is enabled. -- [#1339610](https://bugzilla.redhat.com/1339610): glusterfs-libs postun ldconfig: relative path `1' used to build cache -- [#1339639](https://bugzilla.redhat.com/1339639): RFE : Feature: Automagic unsplit-brain policies for AFR -- [#1340487](https://bugzilla.redhat.com/1340487): copy-export-ganesha.sh does not have a correct shebang -- [#1340935](https://bugzilla.redhat.com/1340935): Automount fails because /sbin/mount.glusterfs does not accept the -s option -- [#1340991](https://bugzilla.redhat.com/1340991): [granular entry sh] - Add more tests -- [#1341069](https://bugzilla.redhat.com/1341069): [geo-rep]: Monitor crashed with [Errno 3] No such process -- [#1341108](https://bugzilla.redhat.com/1341108): [geo-rep]: If the session is renamed, geo-rep configuration are not retained -- [#1341295](https://bugzilla.redhat.com/1341295): build: RHEL7 unpackaged files /var/lib/glusterd/hooks/.../S57glusterfind-delete-post.{pyc,pyo} -- [#1341477](https://bugzilla.redhat.com/1341477): ERROR and Warning message on writing a file from mount point "null gfid for path (null)" repeated 3 times between" -- [#1341556](https://bugzilla.redhat.com/1341556): [features/worm] Unwind FOPs with op_errno and add gf_worm prefix to functions -- [#1341697](https://bugzilla.redhat.com/1341697): Add ability to set oom_score_adj for glusterfs process -- [#1341770](https://bugzilla.redhat.com/1341770): After setting up ganesha on RHEL 6, nodes remains in stopped state and grace related failures observed in pcs status -- [#1341944](https://bugzilla.redhat.com/1341944): [geo-rep]: Snapshot creation having geo-rep session is broken -- [#1342083](https://bugzilla.redhat.com/1342083): changelog: changelog_rollover breaks when number of fds opened is more than 1024 -- [#1342178](https://bugzilla.redhat.com/1342178): Directory creation(mkdir) fails when the remove brick is initiated for replicated volumes accessing via nfs-ganesha -- [#1342275](https://bugzilla.redhat.com/1342275): [PATCH] Small typo fixes -- [#1342350](https://bugzilla.redhat.com/1342350): Volume set option not present to enable leases -- [#1342372](https://bugzilla.redhat.com/1342372): [quota+snapshot]: Directories are inaccessible from activated snapshot, when the snapshot was created during directory creation -- [#1342387](https://bugzilla.redhat.com/1342387): Log parameters such as the gfid, fd address, offset and length of the reads upon failure for easier debugging -- [#1342452](https://bugzilla.redhat.com/1342452): upgrade path when slave volume uuid used in geo-rep session -- [#1342620](https://bugzilla.redhat.com/1342620): libglusterfs: race conditions and illegal mem access in timer -- [#1342634](https://bugzilla.redhat.com/1342634): [georep]: Stopping volume fails if it has geo-rep session (Even in stopped state) -- [#1342954](https://bugzilla.redhat.com/1342954): self heal deamon killed due to oom kills on a dist-disperse volume using nfs ganesha -- [#1343287](https://bugzilla.redhat.com/1343287): Enabling glusternfs with nfs.rpc-auth-allow to many hosts failed -- [#1343368](https://bugzilla.redhat.com/1343368): Input / Output when chmoding files on NFS mount point -- [#1344421](https://bugzilla.redhat.com/1344421): fd leak in disperse -- [#1344559](https://bugzilla.redhat.com/1344559): conservative merge happening on a x3 volume for a deleted file -- [#1344594](https://bugzilla.redhat.com/1344594): [disperse] mkdir after re balance give Input/Output Error -- [#1344607](https://bugzilla.redhat.com/1344607): [geo-rep]: Add-Brick use case: create push-pem force on existing geo-rep fails -- [#1344631](https://bugzilla.redhat.com/1344631): fail delete volume operation if one of the glusterd instance is down in cluster -- [#1345713](https://bugzilla.redhat.com/1345713): [features/worm] - write FOP should pass for the normal files -- [#1345977](https://bugzilla.redhat.com/1345977): api: revert glfs_ipc_xd intended for 4.0 -- [#1346222](https://bugzilla.redhat.com/1346222): Add graph for decompounder xlator diff --git a/doc/release-notes/3.8.1.md b/doc/release-notes/3.8.1.md deleted file mode 100644 index fec35440704..00000000000 --- a/doc/release-notes/3.8.1.md +++ /dev/null @@ -1,42 +0,0 @@ -# Release notes for Gluster 3.8.1 - -This is a bugfix release. The [Release Notes for 3.8.0](3.8.0.md) contain a -listing of all the new features that were added and bugs fixed in the GlusterFS -3.8 stable release. - -## Bugs addressed - -A total of 35 patches have been sent, addressing 32 bugs: - -- [#1345883](https://bugzilla.redhat.com/1345883): [geo-rep]: Worker died with [Errno 2] No such file or directory -- [#1346134](https://bugzilla.redhat.com/1346134): quota : rectify quota-deem-statfs default value in gluster v set help command -- [#1346158](https://bugzilla.redhat.com/1346158): Possible crash due to a timer cancellation race -- [#1346750](https://bugzilla.redhat.com/1346750): Unsafe access to inode->fd_list -- [#1347207](https://bugzilla.redhat.com/1347207): Old documentation link in log during Geo-rep MISCONFIGURATION -- [#1347355](https://bugzilla.redhat.com/1347355): glusterd: SuSE build system error for incorrect strcat, strncat usage -- [#1347489](https://bugzilla.redhat.com/1347489): IO ERROR when multiple graph switches -- [#1347509](https://bugzilla.redhat.com/1347509): Data Tiering:tier volume status shows as in-progress on all nodes of a cluster even if the node is not part of volume -- [#1347524](https://bugzilla.redhat.com/1347524): NFS+attach tier:IOs hang while attach tier is issued -- [#1347529](https://bugzilla.redhat.com/1347529): rm -rf to a dir gives directory not empty(ENOTEMPTY) error -- [#1347553](https://bugzilla.redhat.com/1347553): O_DIRECT support for sharding -- [#1347590](https://bugzilla.redhat.com/1347590): Ganesha+Tiering: Continuous "0-glfs_h_poll_cache_invalidation: invalid argument" messages getting logged in ganesha-gfapi logs. -- [#1348055](https://bugzilla.redhat.com/1348055): cli core dumped while providing/not wrong values during arbiter replica volume -- [#1348060](https://bugzilla.redhat.com/1348060): Worker dies with [Errno 5] Input/output error upon creation of entries at slave -- [#1348086](https://bugzilla.redhat.com/1348086): [geo-rep]: Worker crashed with "KeyError: " -- [#1349274](https://bugzilla.redhat.com/1349274): [geo-rep]: If the data is copied from .snaps directory to the master, it doesn't get sync to slave [First Copy] -- [#1349711](https://bugzilla.redhat.com/1349711): [Granular entry sh] - Implement renaming of indices in index translator -- [#1349879](https://bugzilla.redhat.com/1349879): AFR winds a few reads of a file in metadata split-brain. -- [#1350326](https://bugzilla.redhat.com/1350326): Protocol client not mounting volumes running on older versions. -- [#1350785](https://bugzilla.redhat.com/1350785): Add relative path validation for gluster copy file utility -- [#1350787](https://bugzilla.redhat.com/1350787): gfapi: in case of handle based APIs, close glfd after successful create -- [#1350789](https://bugzilla.redhat.com/1350789): Buffer overflow when attempting to create filesystem using libgfapi as driver on OpenStack -- [#1351025](https://bugzilla.redhat.com/1351025): Implement API to get page aligned iobufs in iobuf.c -- [#1351151](https://bugzilla.redhat.com/1351151): ganesha.enable remains on in volume info file even after we disable nfs-ganesha on the cluster. -- [#1351154](https://bugzilla.redhat.com/1351154): nfs-ganesha disable doesn't delete nfs-ganesha folder from /var/run/gluster/shared_storage -- [#1351711](https://bugzilla.redhat.com/1351711): build: remove absolute paths from glusterfs spec file -- [#1352281](https://bugzilla.redhat.com/1352281): Issues reported by Coverity static analysis tool -- [#1352393](https://bugzilla.redhat.com/1352393): [FEAT] DHT - rebalance - rebalance status o/p should be different for 'fix-layout' option, it should not show 'Rebalanced-files' , 'Size', 'Scanned' etc as it is not migrating any files. -- [#1352632](https://bugzilla.redhat.com/1352632): qemu libgfapi clients hang when doing I/O -- [#1352817](https://bugzilla.redhat.com/1352817): [scale]: Bricks not started after node reboot. -- [#1352880](https://bugzilla.redhat.com/1352880): gluster volume info --xml returns 0 for nonexistent volume -- [#1353426](https://bugzilla.redhat.com/1353426): glusterd: glusterd provides stale port information when a volume is recreated with same brick path diff --git a/doc/release-notes/3.8.2.md b/doc/release-notes/3.8.2.md deleted file mode 100644 index 6fd434989d1..00000000000 --- a/doc/release-notes/3.8.2.md +++ /dev/null @@ -1,60 +0,0 @@ -# Release notes for Gluster 3.8.2 - -This is a bugfix release. The [Release Notes for 3.8.0](3.8.0.md) and -[3.8.1](3.8.1.md) contain a listing of all the new features that were added and -bugs fixed in the GlusterFS 3.8 stable release. - -## Bugs addressed - -A total of 54 patches have been merged, addressing 50 bugs: - -- [#1339928](https://bugzilla.redhat.com/1339928): Misleading error message on rebalance start when one of the glusterd instance is down -- [#1346133](https://bugzilla.redhat.com/1346133): tiering : Multiple brick processes crashed on tiered volume while taking snapshots -- [#1351878](https://bugzilla.redhat.com/1351878): client ID should logged when SSL connection fails -- [#1352771](https://bugzilla.redhat.com/1352771): [DHT]: Rebalance info for remove brick operation is not showing after glusterd restart -- [#1352926](https://bugzilla.redhat.com/1352926): gluster volume status <volume> client" isn't showing any information when one of the nodes in a 3-way Distributed-Replicate volume is shut down -- [#1353814](https://bugzilla.redhat.com/1353814): Bricks are starting when server quorum not met. -- [#1354250](https://bugzilla.redhat.com/1354250): Gluster fuse client crashed generating core dump -- [#1354395](https://bugzilla.redhat.com/1354395): rpc-transport: compiler warning format string -- [#1354405](https://bugzilla.redhat.com/1354405): process glusterd set TCP_USER_TIMEOUT failed -- [#1354429](https://bugzilla.redhat.com/1354429): [Bitrot] Need a way to set scrub interval to a minute, for ease of testing -- [#1354499](https://bugzilla.redhat.com/1354499): service file is executable -- [#1355609](https://bugzilla.redhat.com/1355609): [granular entry sh] - Clean up (stale) directory indices in the event of an `rm -rf` and also in the normal flow while a brick is down -- [#1355610](https://bugzilla.redhat.com/1355610): Fix timing issue in tests/bugs/glusterd/bug-963541.t -- [#1355639](https://bugzilla.redhat.com/1355639): [Bitrot]: Scrub status- Certain fields continue to show previous run's details, even if the current run is in progress -- [#1356439](https://bugzilla.redhat.com/1356439): Upgrade from 3.7.8 to 3.8.1 doesn't regenerate the volfiles -- [#1357257](https://bugzilla.redhat.com/1357257): observing " Too many levels of symbolic links" after adding bricks and then issuing a replace brick -- [#1357773](https://bugzilla.redhat.com/1357773): [georep]: If a georep session is recreated the existing files which are deleted from slave doesn't get sync again from master -- [#1357834](https://bugzilla.redhat.com/1357834): Gluster/NFS does not accept dashes in hostnames in exports/netgroups files -- [#1357975](https://bugzilla.redhat.com/1357975): [Bitrot+Sharding] Scrub status shows incorrect values for 'files scrubbed' and 'files skipped' -- [#1358262](https://bugzilla.redhat.com/1358262): Trash translator fails to create 'internal_op' directory under already existing trash directory -- [#1358591](https://bugzilla.redhat.com/1358591): Fix spurious failure of tests/bugs/glusterd/bug-1111041.t -- [#1359020](https://bugzilla.redhat.com/1359020): [Bitrot]: Sticky bit files considered and skipped by the scrubber, instead of getting ignored. -- [#1359364](https://bugzilla.redhat.com/1359364): changelog/rpc: Memory leak- rpc_clnt_t object is never freed -- [#1359625](https://bugzilla.redhat.com/1359625): remove hardcoding in get_aux function -- [#1359654](https://bugzilla.redhat.com/1359654): Polling failure errors getting when volume is started&stopped with SSL enabled setup. -- [#1360122](https://bugzilla.redhat.com/1360122): Tiering related core observed with "uuid_is_null () message". -- [#1360138](https://bugzilla.redhat.com/1360138): [Stress/Scale] : I/O errors out from gNFS mount points during high load on an erasure coded volume,Logs flooded with Error messages. -- [#1360174](https://bugzilla.redhat.com/1360174): IO error seen with Rolling or non-disruptive upgrade of an distribute-disperse(EC) volume from 3.7.5 to 3.7.9 -- [#1360556](https://bugzilla.redhat.com/1360556): afr coverity fixes -- [#1360573](https://bugzilla.redhat.com/1360573): Fix spurious failures in split-brain-favorite-child-policy.t -- [#1360574](https://bugzilla.redhat.com/1360574): multiple failures of tests/bugs/disperse/bug-1236065.t -- [#1360575](https://bugzilla.redhat.com/1360575): Fix spurious failures in ec.t -- [#1360576](https://bugzilla.redhat.com/1360576): [Disperse volume]: IO hang seen on mount with file ops -- [#1360579](https://bugzilla.redhat.com/1360579): tests: ./tests/bitrot/br-stub.t fails intermittently -- [#1360985](https://bugzilla.redhat.com/1360985): [SNAPSHOT]: The PID for snapd is displayed even after snapd process is killed. -- [#1361449](https://bugzilla.redhat.com/1361449): Direct io to sharded files fails when on zfs backend -- [#1361483](https://bugzilla.redhat.com/1361483): posix: leverage FALLOC_FL_ZERO_RANGE in zerofill fop -- [#1361665](https://bugzilla.redhat.com/1361665): Memory leak observed with upcall polling -- [#1362025](https://bugzilla.redhat.com/1362025): Add output option `--xml` to man page of gluster -- [#1362065](https://bugzilla.redhat.com/1362065): tests: ./tests/bitrot/bug-1244613.t fails intermittently -- [#1362069](https://bugzilla.redhat.com/1362069): [GSS] Rebalance crashed -- [#1362198](https://bugzilla.redhat.com/1362198): [tiering]: Files of size greater than that of high watermark level should not be promoted -- [#1363598](https://bugzilla.redhat.com/1363598): File not found errors during rpmbuild: /var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post.py{c,o} -- [#1364326](https://bugzilla.redhat.com/1364326): Spurious failure in tests/bugs/glusterd/bug-1089668.t -- [#1364329](https://bugzilla.redhat.com/1364329): Glusterd crashes upon receiving SIGUSR1 -- [#1364365](https://bugzilla.redhat.com/1364365): Bricks doesn't come online after reboot [ Brick Full ] -- [#1364497](https://bugzilla.redhat.com/1364497): posix: honour fsync flags in posix_do_zerofill -- [#1365265](https://bugzilla.redhat.com/1365265): Glusterd not operational due to snapshot conflicting with nfs-ganesha export file in "/var/lib/glusterd/snaps" -- [#1365742](https://bugzilla.redhat.com/1365742): inode leak in brick process -- [#1365743](https://bugzilla.redhat.com/1365743): GlusterFS - Memory Leak - High Memory Utilization diff --git a/doc/release-notes/3.8.3.md b/doc/release-notes/3.8.3.md deleted file mode 100644 index b3785bf3fbd..00000000000 --- a/doc/release-notes/3.8.3.md +++ /dev/null @@ -1,51 +0,0 @@ -# Release notes for Gluster 3.8.3 - -This is a bugfix release. The [Release Notes for 3.8.0](3.8.0.md), -[3.8.1](3.8.1.md) and [3.8.2](3.8.2.md) contain a listing of all the new -features that were added and bugs fixed in the GlusterFS 3.8 stable release. - -## Out of Order release to address a severe usability regression - -Due to a major regression that was not caught and reported by any of the -testing that has been performed, this release is done outside of the normal -schedule. - -The main reason to release 3.8.3 earlier than planned is to fix [bug -1366813](https://bugzilla.redhat.com/1366813): - -> On restarting GlusterD or rebooting a GlusterFS server, only the bricks of -> the first volume get started. The bricks of the remaining volumes are not -> started. This is a regression caused by a change in GlusterFS-3.8.2. -> -> This regression breaks automatic start of volumes on rebooting servers, and -> leaves the volumes inoperable. GlusterFS volumes could be left in an -> inoperable state after upgrading to 3.8.2, as upgrading involves restarting -> GlusterD. -> -> Users can forcefully start the remaining volumes, by doing running the -> `gluster volume start <name> force` command. - -## Bugs addressed - -A total of 24 patches have been merged, addressing 21 bugs: - -- [#1357767](https://bugzilla.redhat.com/1357767): Wrong XML output for Volume Options -- [#1362540](https://bugzilla.redhat.com/1362540): glfs_fini() crashes with SIGSEGV -- [#1364382](https://bugzilla.redhat.com/1364382): RFE:nfs-ganesha:prompt the nfs-ganesha disable cli to let user provide "yes or no" option -- [#1365734](https://bugzilla.redhat.com/1365734): Mem leak in meta_default_readv in meta xlators -- [#1365742](https://bugzilla.redhat.com/1365742): inode leak in brick process -- [#1365756](https://bugzilla.redhat.com/1365756): [SSL] : gluster v set help does not show ssl options -- [#1365821](https://bugzilla.redhat.com/1365821): IO ERROR when multiple graph switches -- [#1365864](https://bugzilla.redhat.com/1365864): gfapi: use const qualifier for glfs_*timens() -- [#1365879](https://bugzilla.redhat.com/1365879): [libgfchangelog]: If changelogs are not available for the requested time range, no proper error message -- [#1366281](https://bugzilla.redhat.com/1366281): glfs_truncate missing -- [#1366440](https://bugzilla.redhat.com/1366440): [AFR]: Files not available in the mount point after converting Distributed volume type to Replicated one. -- [#1366482](https://bugzilla.redhat.com/1366482): SAMBA-DHT : Crash seen while rename operations in cifs mount and windows access of share mount -- [#1366489](https://bugzilla.redhat.com/1366489): "heal info --xml" not showing the brick name of offline bricks. -- [#1366813](https://bugzilla.redhat.com/1366813): Second gluster volume is offline after daemon restart or server reboot -- [#1367272](https://bugzilla.redhat.com/1367272): [HC]: After bringing down and up of the bricks VM's are getting paused -- [#1367297](https://bugzilla.redhat.com/1367297): Error and warning messages related to xlator/features/snapview-client.so adding up to the client log on performing IO operations -- [#1367363](https://bugzilla.redhat.com/1367363): Log EEXIST errors at DEBUG level -- [#1368053](https://bugzilla.redhat.com/1368053): [geo-rep] Stopped geo-rep session gets started automatically once all the master nodes are upgraded -- [#1368423](https://bugzilla.redhat.com/1368423): core: use <sys/sysmacros.h> for makedev(3), major(3), minor(3) -- [#1368738](https://bugzilla.redhat.com/1368738): gfapi-trunc test shouldn't be .t diff --git a/doc/release-notes/3.8.4.md b/doc/release-notes/3.8.4.md deleted file mode 100644 index 2598f0c121c..00000000000 --- a/doc/release-notes/3.8.4.md +++ /dev/null @@ -1,33 +0,0 @@ -# Release notes for Gluster 3.8.4 - -This is a bugfix release. The [Release Notes for 3.8.0](3.8.0.md), -[3.8.1](3.8.1.md),[3.8.2](3.8.2.md) and [3.8.3](3.8.3.md) contain a listing of -all the new features that were added and bugs fixed in the GlusterFS 3.8 stable -release. - -## Bugs addressed - -A total of 23 patches have been merged, addressing 22 bugs: - -- [#1332424](https://bugzilla.redhat.com/1332424): geo-rep: address potential leak of memory -- [#1357760](https://bugzilla.redhat.com/1357760): Geo-rep silently ignores config parser errors -- [#1366496](https://bugzilla.redhat.com/1366496): 1 mkdir generates tons of log messages from dht xlator -- [#1366746](https://bugzilla.redhat.com/1366746): EINVAL errors while aggregating the directory size by quotad -- [#1368841](https://bugzilla.redhat.com/1368841): Applications not calling glfs_h_poll_upcall() have upcall events cached for no use -- [#1368918](https://bugzilla.redhat.com/1368918): tests/bugs/cli/bug-1320388.t: Infrequent failures -- [#1368927](https://bugzilla.redhat.com/1368927): Error: quota context not set inode (gfid:nnn) [Invalid argument] -- [#1369042](https://bugzilla.redhat.com/1369042): thread CPU saturation limiting throughput on write workloads -- [#1369187](https://bugzilla.redhat.com/1369187): fix bug in protocol/client lookup callback -- [#1369328](https://bugzilla.redhat.com/1369328): [RFE] Add a count of snapshots associated with a volume to the output of the vol info command -- [#1369372](https://bugzilla.redhat.com/1369372): gluster snap status xml output shows incorrect details when the snapshots are in deactivated state -- [#1369517](https://bugzilla.redhat.com/1369517): rotated FUSE mount log is using to populate the information after log rotate. -- [#1369748](https://bugzilla.redhat.com/1369748): Memory leak with a replica 3 arbiter 1 configuration -- [#1370172](https://bugzilla.redhat.com/1370172): protocol/server: readlink rsp xdr failed while readlink got an error -- [#1370390](https://bugzilla.redhat.com/1370390): Locks xlators is leaking fdctx in pl_release() -- [#1371194](https://bugzilla.redhat.com/1371194): segment fault while join thread reaper_thr in fini() -- [#1371650](https://bugzilla.redhat.com/1371650): [Open SSL] : Unable to mount an SSL enabled volume via SMB v3/Ganesha v4 -- [#1371912](https://bugzilla.redhat.com/1371912): `gluster system:: uuid get` hangs -- [#1372728](https://bugzilla.redhat.com/1372728): Node remains in stopped state in pcs status with "/usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]" messages in logs. -- [#1373530](https://bugzilla.redhat.com/1373530): Minor improvements and cleanup for the build system -- [#1374290](https://bugzilla.redhat.com/1374290): "gluster vol status all clients --xml" doesn't generate xml if there is a failure in between -- [#1374565](https://bugzilla.redhat.com/1374565): [Bitrot]: Recovery fails of a corrupted hardlink (and the corresponding parent file) in a disperse volume diff --git a/doc/release-notes/3.8.5.md b/doc/release-notes/3.8.5.md deleted file mode 100644 index 459148895c4..00000000000 --- a/doc/release-notes/3.8.5.md +++ /dev/null @@ -1,37 +0,0 @@ -# Release notes for Gluster 3.8.5 - -This is a bugfix release. The [Release Notes for 3.8.0](3.8.0.md), -[3.8.1](3.8.1.md), [3.8.2](3.8.2.md), [3.8.3](3.8.3.md) and [3.8.4](3.8.4.md) -contain a listing of all the new features that were added and bugs fixed in the -GlusterFS 3.8 stable release. - -## Bugs addressed - -A total of 27 patches have been merged, addressing 26 bugs: - -- [#1373723](https://bugzilla.redhat.com/1373723): glusterd experiencing repeated connect/disconnect messages when shd is down -- [#1374135](https://bugzilla.redhat.com/1374135): Rebalance is not considering the brick sizes while fixing the layout -- [#1374280](https://bugzilla.redhat.com/1374280): rpc/xdr: generated files are filtered with a sed extended regex -- [#1374573](https://bugzilla.redhat.com/1374573): gluster fails to propagate permissions on the root of a gluster export when adding bricks -- [#1374580](https://bugzilla.redhat.com/1374580): Geo-rep worker Faulty with OSError: [Errno 21] Is a directory -- [#1374596](https://bugzilla.redhat.com/1374596): [geo-rep]: AttributeError: 'Popen' object has no attribute 'elines' -- [#1374610](https://bugzilla.redhat.com/1374610): geo-replication *changes.log does not respect the log-level configured -- [#1374627](https://bugzilla.redhat.com/1374627): Worker crashes with EINVAL errors -- [#1374632](https://bugzilla.redhat.com/1374632): [geo-replication]: geo-rep Status is not showing bricks from one of the nodes -- [#1374640](https://bugzilla.redhat.com/1374640): glusterfs: create a directory with 0464 mode return EIO error -- [#1375043](https://bugzilla.redhat.com/1375043): bug-963541.t spurious failure -- [#1375096](https://bugzilla.redhat.com/1375096): dht: Update stbuf from servers having layout -- [#1375098](https://bugzilla.redhat.com/1375098): Value of `replica.split-brain-status' attribute of a directory in metadata split-brain in a dist-rep volume reads that it is not in split-brain -- [#1375542](https://bugzilla.redhat.com/1375542): [geo-rep]: defunct tar process while using tar+ssh sync -- [#1375565](https://bugzilla.redhat.com/1375565): Detach tier commit is allowed when detach tier start goes into failed state -- [#1375959](https://bugzilla.redhat.com/1375959): Files not being opened with o_direct flag during random read operation (Glusterfs 3.8.2) -- [#1375990](https://bugzilla.redhat.com/1375990): Enable gfapi test cases in Gluster upstream regression -- [#1376385](https://bugzilla.redhat.com/1376385): /var/tmp/rpm-tmp.KPCugR: line 2: /bin/systemctl: No such file or directory -- [#1376390](https://bugzilla.redhat.com/1376390): Spurious regression in tests/basic/gfapi/bug1291259.t -- [#1377193](https://bugzilla.redhat.com/1377193): Poor smallfile read performance on Arbiter volume compared to Replica 3 volume -- [#1377290](https://bugzilla.redhat.com/1377290): The GlusterFS Callback RPC-calls always use RPC/XID 42 -- [#1379216](https://bugzilla.redhat.com/1379216): rpc_clnt will sometimes not reconnect when using encryption -- [#1379284](https://bugzilla.redhat.com/1379284): warning messages seen in glusterd logs for each 'gluster volume status' command -- [#1379708](https://bugzilla.redhat.com/1379708): gfapi: Fix fd ref leaks -- [#1383694](https://bugzilla.redhat.com/1383694): GlusterFS fails to build on old Linux distros with linux/oom.h missing -- [#1383882](https://bugzilla.redhat.com/1383882): client ID should logged when SSL connection fails diff --git a/doc/release-notes/3.8.6.md b/doc/release-notes/3.8.6.md deleted file mode 100644 index 1ad77f3dbf8..00000000000 --- a/doc/release-notes/3.8.6.md +++ /dev/null @@ -1,60 +0,0 @@ -# Release notes for Gluster 3.8.6 - -This is a bugfix release. The [Release Notes for 3.8.0](3.8.0.md), -[3.8.1](3.8.1.md), [3.8.2](3.8.2.md), [3.8.3](3.8.3.md), [3.8.4](3.8.4.md) and -[3.8.5](3.8.5.md) contain a listing of all the new features that were added and -bugs fixed in the GlusterFS 3.8 stable release. - - -## Change in port allocation, may affect deployments with strict firewalls - -'''Problem description''': GlusterD used to assume that the brick port which -was previously allocated to a brick, would still be available, and in doing so -would reuse the port for the brick without registering with the port map -server. The port map server would not be aware of the brick reusing the same -port, and try to allocate it to another process, and in turn result in that -process' failure to connect to the port. - -'''Fix and port usage changes''': With the fix, we force GlusterD to unregister -a port previously used by the brick, and register a new port with the port map -server and then use it. As a result of this change, there will be no conflict -between processes competing over the same port, thereby fixing the issue. Also -because of this change, a brick process on restart is not guaranteed to reuse -the same port it used to be connected to. - - -## Bugs addressed - -A total of 34 patches have been merged, addressing 31 bugs: - -- [#1336376](https://bugzilla.redhat.com/1336376): Sequential volume start&stop is failing with SSL enabled setup. -- [#1347717](https://bugzilla.redhat.com/1347717): removal of file from nfs mount crashs ganesha server -- [#1369766](https://bugzilla.redhat.com/1369766): glusterd: add brick command should re-use the port for listening which is freed by remove-brick. -- [#1371397](https://bugzilla.redhat.com/1371397): [Disperse] dd + rm + ls lead to IO hang -- [#1375125](https://bugzilla.redhat.com/1375125): arbiter volume write performance is bad. -- [#1377448](https://bugzilla.redhat.com/1377448): glusterd: Display proper error message and fail the command if S32gluster_enable_shared_storage.sh hook script is not present during gluster volume set all cluster.enable-shared-storage <enable/disable> command -- [#1384345](https://bugzilla.redhat.com/1384345): usage text is wrong for use-readdirp mount default -- [#1384356](https://bugzilla.redhat.com/1384356): Polling failure errors getting when volume is started&stopped with SSL enabled setup. -- [#1385442](https://bugzilla.redhat.com/1385442): invalid argument warning messages seen in fuse client logs 2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) 0-dict: !this || !value for key=link-count [Invalid argument] -- [#1385620](https://bugzilla.redhat.com/1385620): Recording (ffmpeg) processes on FUSE get hung -- [#1386071](https://bugzilla.redhat.com/1386071): Spurious permission denied problems observed -- [#1387976](https://bugzilla.redhat.com/1387976): Continuous warning messages getting when one of the cluster node is down on SSL setup. -- [#1388354](https://bugzilla.redhat.com/1388354): Memory Leaks in snapshot code path -- [#1388580](https://bugzilla.redhat.com/1388580): crypt: changes needed for openssl-1.1 (coming in Fedora 26) -- [#1388948](https://bugzilla.redhat.com/1388948): glusterfs can't self heal character dev file for invalid dev_t parameters -- [#1390838](https://bugzilla.redhat.com/1390838): write-behind: flush stuck by former failed write -- [#1390870](https://bugzilla.redhat.com/1390870): DHT: Rebalance- Misleading log messages from __dht_check_free_space function -- [#1391450](https://bugzilla.redhat.com/1391450): md-cache: Invalidate cache entry in case of OPEN with O_TRUNC -- [#1392288](https://bugzilla.redhat.com/1392288): gfapi clients crash while using async calls due to double fd_unref -- [#1392364](https://bugzilla.redhat.com/1392364): trashcan max file limit cannot go beyond 1GB -- [#1392716](https://bugzilla.redhat.com/1392716): Quota version not changing in the quota.conf after upgrading to 3.7.1 from 3.6.1 -- [#1392846](https://bugzilla.redhat.com/1392846): Hosted Engine VM paused post replace-brick operation -- [#1392868](https://bugzilla.redhat.com/1392868): The FUSE client log is filling up with posix_acl_default and posix_acl_access messages -- [#1393630](https://bugzilla.redhat.com/1393630): Better logging when reporting failures of the kind "<file-path> Failing MKNOD as quorum is not met" -- [#1393682](https://bugzilla.redhat.com/1393682): stat of file is hung with possible deadlock -- [#1394108](https://bugzilla.redhat.com/1394108): Continuous errors getting in the mount log when the volume mount server glusterd is down. -- [#1394187](https://bugzilla.redhat.com/1394187): SMB[md-cache Private Build]:Error messages in brick logs related to upcall_cache_invalidate gf_uuid_is_null -- [#1394226](https://bugzilla.redhat.com/1394226): "nfs-grace-monitor" timed out messages observed -- [#1394883](https://bugzilla.redhat.com/1394883): Failed to enable nfs-ganesha after disabling nfs-ganesha cluster -- [#1395627](https://bugzilla.redhat.com/1395627): Labelled geo-rep checkpoints hide geo-replication status -- [#1396418](https://bugzilla.redhat.com/1396418): [md-cache]: All bricks crashed while performing symlink and rename from client at the same time diff --git a/doc/release-notes/3.8.7.md b/doc/release-notes/3.8.7.md deleted file mode 100644 index 5a2fc980297..00000000000 --- a/doc/release-notes/3.8.7.md +++ /dev/null @@ -1,76 +0,0 @@ -# Release notes for Gluster 3.8.7 - -This is a bugfix release. The [Release Notes for 3.8.0](3.8.0.md), -[3.8.1](3.8.1.md), [3.8.2](3.8.2.md), [3.8.3](3.8.3.md), [3.8.4](3.8.4.md), -[3.8.5](3.8.5.md) and [3.8.6](3.8.6.md) contain a listing of all the new -features that were added and bugs fixed in the GlusterFS 3.8 stable release. - - -## New CLI option for granular entry heal enablement/disablement - -When there are already existing non-granular indices created that are yet to be -healed, if granular-entry-heal option is toggled from `off` to `on`, AFR -self-heal whenever it kicks in, will try to look for granular indices in -`entry-changes`. Because of the absence of name indices, granular entry healing -logic will fail to heal these directories, and worse yet unset pending extended -attributes with the assumption that are no entries that need heal. - -To get around this, a new CLI is introduced which will invoke glfsheal program -to figure whether at the time an attempt is made to enable granular entry heal, -there are pending heals on the volume OR there are one or more bricks that are -down. If either of them is true, the command will be failed with the -appropriate error. - - # gluster volume heal <VOL> granular-entry-heal {enable,disable} - -With this change, the user does not need to worry about when to enable/disable -the option - the CLI command itself performs the necessary checks before -allowing the "enable" command to proceed. - -What are those checks? -* Whether heal is already needed on the volume -* Whether any of the replicas is down - -In both of the cases, the command will be failed since AFR will be switching -from creating heal indices (markers for files that need heal) under -`.glusterfs/indices/xattrop` to creating them under -`.glusterfs/indices/entry-changes`. -The moment this switch happens, self-heal-daemon will cease to crawl the -entire directory if a directory needs heal and instead looks for exact names -under a directory that need heal under `.glusterfs/indices/entry-changes`. This -might cause self-heal to miss healing some entries (because before the -switch directories already needing heal won't have any indices under -`.glusterfs/indices/entry-changes`) and mistakenly unset the pending heal -xattrs even though the individual replicas are not in sync. - -When should users enable this option? -* When they want to use the feature ;) -* which is useful for faster self-healing in use cases with large number of - files under a single directory. - For example, it is useful in VM use cases with smaller shard sizes, given - that all shards are created under a single directory `.shard`. When a shard - is created while a replica was down, once it is back up, self-heal due to its - maintaining granular indices will know exactly which shard to recreate on the - sync as opposed to crawling the entire `.shard` directory to find out the - same information. - - -## Bugs addressed - -A total of 16 patches have been merged, addressing 15 bugs: - -- [#1395652](https://bugzilla.redhat.com/1395652): ganesha-ha.conf --status should validate if the VIPs are assigned to right nodes -- [#1397663](https://bugzilla.redhat.com/1397663): libgfapi core dumps -- [#1398501](https://bugzilla.redhat.com/1398501): [granular entry sh] - Provide a CLI to enable/disable the feature that checks that there are no heals pending before allowing the operation -- [#1399018](https://bugzilla.redhat.com/1399018): performance.read-ahead on results in processes on client stuck in IO wait -- [#1399088](https://bugzilla.redhat.com/1399088): geo-replica slave node goes faulty for non-root user session due to fail to locate gluster binary -- [#1399090](https://bugzilla.redhat.com/1399090): [geo-rep]: Worker crashes seen while renaming directories in loop -- [#1399130](https://bugzilla.redhat.com/1399130): SEEK_HOLE/ SEEK_DATA doesn't return the correct offset -- [#1399635](https://bugzilla.redhat.com/1399635): Refresh config fails while exporting subdirectories within a volume -- [#1400459](https://bugzilla.redhat.com/1400459): [USS,SSL] .snaps directory is not reachable when I/O encryption (SSL) is enabled -- [#1400573](https://bugzilla.redhat.com/1400573): Ganesha services are not stopped when pacemaker quorum is lost -- [#1400802](https://bugzilla.redhat.com/1400802): glusterfs_ctx_defaults_init is re-initializing ctx->locks -- [#1400927](https://bugzilla.redhat.com/1400927): Memory leak when self healing daemon queue is full -- [#1402672](https://bugzilla.redhat.com/1402672): Getting the warning message while erasing the gluster "glusterfs-server" package. -- [#1403192](https://bugzilla.redhat.com/1403192): Files remain unhealed forever if shd is disabled and re-enabled while healing is in progress. -- [#1403646](https://bugzilla.redhat.com/1403646): self-heal not happening, as self-heal info lists the same pending shards to be healed |
