Merge branch 'upstream'

Conflicts: glusterfs.spec.in xlators/mgmt/glusterd/src/Makefile.am xlators/mgmt/glusterd/src/glusterd-utils.c xlators/mgmt/glusterd/src/glusterd.h Change-Id: I27bdcf42b003cfc42d6ad981bd2bf8180176806d
author: Jeff Darcy <jdarcy@redhat.com> 2014-04-22 15:37:09 +0000
committer: Jeff Darcy <jdarcy@redhat.com> 2014-04-22 15:37:09 +0000
commit: a827c5eab32a43ade5551259ea56a6a1af7e861b (patch)
tree: e6707df68f72baa8645210ba931272285116ad85 /doc
parent: 46d333783a968ab39e0beade9c7a1eec8035f8b1 (diff)
parent: 99bfc2a2a1689da1e173cb2f8ef54d2b09ef3a5d (diff)
10 files changed, 575 insertions, 3 deletions
diff --git a/doc/admin-guide/en-US/markdown/admin_distributed_geo_rep.md b/doc/admin-guide/en-US/markdown/admin_distributed_geo_rep.md
new file mode 100644
index 000000000..1ec4f28ae
--- /dev/null
+++ b/doc/admin-guide/en-US/markdown/admin_distributed_geo_rep.md
@@ -0,0 +1,114 @@
+# Distributed Geo-Replication in glusterfs-3.5
+
+This is a admin how-to guide for new dustributed-geo-replication being released as part of glusterfs-3.5
+
+##### Note:
+This article is targeted towards users/admins who want to try new geo-replication, without going much deeper into internals and technology used.
+
+### How is it different from earlier geo-replication?
+
+- Up until now, in geo-replication, only one of the nodes in master volume would participate in geo-replication. This meant that all the data syncing is taken care by only one node while other nodes in the cluster would sit idle (not participate in data syncing). With distributed-geo-replication, each node of the master volume takes the repsonsibility of syncing the data present in that node. In case of replicate configuration, one of them would 'Active'ly sync the data while other node of the replica pair would be 'Passive'. The 'Passive' node only becomes 'Active' when the 'Active' pair goes down. This way new geo-rep leverages all the nodes in the volume and remove the bottleneck of syncing from one single node. 
+- New change detection mechanism is the other thing which has been improved with new geo-rep. So far geo-rep used to crawl through glusterfs file system to figure out the files that need to synced. And because crawling filesystem can be an expensive operation, this used to be a major bottleneck for performance. With distributed geo-rep, all the files that need to be synced are identified through changelog xlator. Changelog xlator journals all the fops that modifes the file and these journals are then consumed by geo-rep to effectively identify the files that need to be synced.
+- A new syncing method tar+ssh, has been introduced to improve the performance of few specific data sets. You can switch between rsync and tar+ssh syncing method via CLI to suite your data set needs. This tar+ssh is better suited for data sets which have large number of small files.
+
+
+### Using Distributed geo-replication:
+
+#### Prerequisites:
+- There should be a password-less ssh setup between at least one node in master volume to one node in slave volume. The geo-rep create command should be executed from this node which has password-less ssh setup to slave.
+
+- Unlike previous version, slave **must** be a gluster volume. Slave can not be a directory. And both the master and slave volumes should have been created and started before creating geo-rep session.
+
+#### Creating secret pem pub file
+- Execute the below command from the node where you setup the password-less ssh to slave. This will create the secret pem pub file which would have information of RSA key of all the nodes in the master volume. And when geo-rep create command is executed, glusterd uses this file to establish a geo-rep specific ssh connections.
+```sh
+gluster system:: execute gsec_create
+```
+
+#### Creating geo-replication session.
+Create a geo-rep session between master and slave volume using the following command. The node in which this command is executed and the <slave_host> specified in the command should have password less ssh setup between them. The push-pem option actually uses the secret pem pub file created earlier and establishes geo-rep specific password less ssh between each node in master to each node of slave.
+```sh
+gluster volume geo-replication <master_volume> <slave_host>::<slave_volume> create push-pem [force]
+```
+
+If the total available size in slave volume is more than the total size of master, the command will throw error message. In such cases 'force' option can be used.
+
+#### Starting a geo-rep session
+There is no change in this command from previous versions to this version.
+```sh
+gluster volume geo-replication <master_volume> <slave_host>::<slave_volume> start
+```
+This command actually starts the session. Meaning the gsyncd monitor process will be started, which in turn spawns gsync worker processes whenever required. This also turns on changelog xlator (if not in ON state already), which starts recording all the changes on each of the glusterfs bricks. And if master is empty during geo-rep start, the change detection mechanism will be changelog. Else it’ll be xsync (the changes are identified by crawling through filesystem). Later when the initial data is syned to slave, change detection mechanism will be set to changelog
+
+#### Status of geo-replication
+
+gluster now has variants of status command.
+
+```sh
+gluster volume geo-replication <master_volume> <slave_volume>::<slave_volume> status
+```
+
+This displays the status of session from each brick of the master to each brick of the slave node.
+
+If you want more detailed status, then run 'status detail'
+
+```sh
+gluster volume geo-replication <master_volume> <slave_volume>::<slave_volume> status detail
+```
+
+This command displays extra information like, total files synced, files that needs to be synced, deletes pending etc.
+
+#### Stopping geo-replication session
+
+This command stops all geo-rep relates processes i.e. gsyncd monitor and works processes. Note that changelog will **not** be turned off with this command.
+
+```sh
+gluster volume geo-replication <master_volume> <slave_volume>::<slave_volume> stop [force]
+```
+Force option is to be used, when one of the node (or glusterd in one of the node) is down. Once stopped, the session can be restarted any time. Note that upon restarting of the session, the change detection mechanism falls back to xsync mode. This happens even though you have changelog generating journals, while the geo-rep session is stopped.
+
+#### Deleting geo-replication session
+
+Now you can delete the glusterfs geo-rep session. This will delete all the config data associated with the geo-rep session.
+
+```sh
+gluster volume geo-replication <master_volume> <slave_volume>::<slave_volume> delete
+```
+
+This deletes all the gsync conf files in each of the nodes. This returns failure, if any of the node is down. And unlike geo-rep stop, there is 'force' option with this.
+
+#### Changing the config values
+
+There are some configuration values which can be changed using the CLI. And you can see all the current config values with following command.
+
+```sh
+gluster volume geo-replication <master_volume> <slave_host>::<slave_volume> config
+```
+
+But you can check only one of them, like log_file or change-detector
+
+```sh
+gluster volume geo-replication <master_volume> <slave_host>::<slave_volume> config log-file
+```
+```sh
+gluster volume geo-replication <master_volume> <slave_host>::<slave_volume> config change-detector
+```
+```sh
+gluster volume geo-replication <master_volume> <slave_host>::<slave_volume> config working-dir
+```
+To set a new value to this, just provide a new value. Note that, not all the config values are allowed to change. Some can not be modified.
+
+```sh
+gluster volume geo-replication <master_volume> <slave_host>::<slave_volume> config change-detector xsync
+```
+Make sure you provide the proper value to the config value. And if you have large number of small files data set, then you can use tar+ssh as syncing method. Note that, if geo-rep session is running, this restarts the gsyncd.
+
+```sh
+gluster volume geo-replication <master_volume> <slave_host>::<slave_volume> config use-tarssh true
+```
+Resetting these value to default is also simple.
+
+```sh
+gluster volume geo-replication <master_volume> <slave_host>::<slave_volume> config \!use-tarssh
+```
+That makes the config key (tar-ssh in this case) to fall back to it’s default value.
diff --git a/doc/admin-guide/en-US/markdown/admin_managing_snapshots.md b/doc/admin-guide/en-US/markdown/admin_managing_snapshots.md
new file mode 100644
index 000000000..e76ee9151
--- /dev/null
+++ b/doc/admin-guide/en-US/markdown/admin_managing_snapshots.md
@@ -0,0 +1,66 @@
+Managing GlusterFS Volume Snapshots
+==========================
+
+This section describes how to perform common GlusterFS volume snapshot
+management operations
+
+Pre-requisites
+=====================
+
+GlusterFS volume snapshot feature is based on thinly provisioned LVM snapshot.
+To make use of snapshot feature GlusterFS volume should fulfill following
+pre-requisites:
+
+* Each brick should be on an independent thinly provisioned LVM.
+* Brick LVM should not contain any other data other than brick.
+* None of the brick should be on a thick LVM.
+
+
+Snapshot Management
+=====================
+
+
+**Snapshot creation**
+
+*gluster snapshot create \<vol-name\> \[-n \<snap-name\>\] \[-d \<description\>\]*
+
+This command will create a snapshot of a GlusterFS volume. User can provide a snap-name and a description to identify the snap. The description cannot be more than 1024 characters.
+
+Volume should be present and it should be in started state.
+
+**Restoring snaps**
+
+*gluster snapshot restore -v \<vol-name\> \<snap-name\>*
+
+This command restores an already taken snapshot of a GlusterFS volume. Snapshot restore is an offline activity therefore if the volume is online then the restore operation will fail.
+
+Once the snapshot is restored  it will be deleted from the list of snapshot.
+
+**Deleting snaps**
+
+*gluster snapshot delete \<volname\>\ -s \<snap-name\> \[force\]*
+
+This command will delete the specified snapshot.
+
+**Listing of available snaps**
+
+*gluster snapshot list \[\<volname\> \[-s \<snap-name>\]\]*
+
+This command is used to list all snapshots taken, or for a specified volume. If snap-name is provided then it will list the details of that snap.
+
+**Configuring the snapshot behavior**
+
+*gluster snapshot config \[\<vol-name | all\>\]*
+
+This command will display existing config values for a volume. If volume name is not provided then config values of all the volume is displayed. System config is displayed irrespective of volume name.
+
+*gluster snapshot config \<vol-name | all\> \[\<snap-max-hard-limit\> \<count\>\] \[\<snap-max-soft-limit\> \<percentage\>\]*
+
+The above command can be used to change the existing config values. If vol-name is provided then config value of that volume is changed, else it will set/change the system limit.
+
+The system limit is the default value of the config for all the volume. Volume specific limit cannot cross the system limit. If a volume specific limit is not provided then system limit will be considered.
+
+If any of this limit is decreased and the current snap count of the system/volume is more than the limit then the command will fail. If user still want to decrease the limit then force option should be used.
+
+
+
diff --git a/doc/features/brick-failure-detection.md b/doc/features/brick-failure-detection.md
new file mode 100644
index 000000000..24f2a18f3
--- /dev/null
+++ b/doc/features/brick-failure-detection.md
@@ -0,0 +1,67 @@
+# Brick Failure Detection
+
+This feature attempts to identify storage/file system failures and disable the failed brick without disrupting the remainder of the node's operation.
+
+## Description
+
+Detecting failures on the filesystem that a brick uses makes it possible to handle errors that are caused from outside of the Gluster environment.
+
+There have been hanging brick processes when the underlying storage of a brick went unavailable. A hanging brick process can still use the network and repond to clients, but actual I/O to the storage is impossible and can cause noticible delays on the client side.
+
+Provide better detection of storage subsytem failures and prevent bricks from hanging. It should prevent hanging brick processes when storage-hardware or the filesystem fails.
+
+A health-checker (thread) has been added to the posix xlator. This thread periodically checks the status of the filesystem (implies checking of functional storage-hardware).
+
+`glusterd` can detect that the brick process has exited, `gluster volume status` will show that the brick process is not running anymore. System administrators checking the logs should be able to triage the cause.
+
+## Usage and Configuration
+
+The health-checker is enabled by default and runs a check every 30 seconds. This interval can be changed per volume with:
+
+    # gluster volume set <VOLNAME> storage.health-check-interval <SECONDS>
+
+If `SECONDS` is set to 0, the health-checker will be disabled.
+
+## Failure Detection
+
+Error are logged to the standard syslog (mostly `/var/log/messages`):
+
+    Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks") error 5 buf count 512
+    Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): I/O Error Detected. Shutting down filesystem
+    Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): Please umount the filesystem and rectify the problem(s)
+    Jun 24 11:31:49 vm130-32 kernel: VFS:Filesystem freeze failed
+    Jun 24 11:31:50 vm130-32 GlusterFS[1969]: [2013-06-24 10:31:50.500674] M [posix-helpers.c:1114:posix_health_check_thread_proc] 0-failing_xfs-posix: health-check failed, going down
+    Jun 24 11:32:09 vm130-32 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
+    Jun 24 11:32:20 vm130-32 GlusterFS[1969]: [2013-06-24 10:32:20.508690] M [posix-helpers.c:1119:posix_health_check_thread_proc] 0-failing_xfs-posix: still alive! -> SIGTERM
+
+The messages labelled with `GlusterFS` in the above output are also written to the logs of the brick process.
+
+## Recovery after a failure
+
+When a brick process detects that the underlaying storage is not responding anymore, the process will exit. There is no automated way that the brick process gets restarted, the sysadmin will need to fix the problem with the storage first.
+
+After correcting the storage (hardware or filesystem) issue, the following command will start the brick process again:
+
+    # gluster volume start <VOLNAME> force
+
+## How To Test
+
+The health-checker thread that is part of each brick process will get started automatically when a volume has been started. Verifying its functionality can be done in different ways.
+
+On virtual hardware:
+
+* disconnect the disk from the VM that holds the brick
+
+On real hardware:
+
+* simulate a RAID-card failure by unplugging the card or cables
+
+On a system that uses LVM for the bricks:
+
+* use device-mapper to load an error-table for the disk, see [this description](http://review.gluster.org/5176).
+
+On any system (writing to random offsets of the block device, more difficult to trigger):
+
+1. cause corruption on the filesystem that holds the brick
+2. read contents from the brick, hoping to hit the corrupted area
+3. the filsystem should abort after hitting a bad spot, the health-checker should notice that shortly afterwards
diff --git a/doc/features/file-snapshot.md b/doc/features/file-snapshot.md
new file mode 100644
index 000000000..7f7c419fc
--- /dev/null
+++ b/doc/features/file-snapshot.md
@@ -0,0 +1,91 @@
+#File Snapshot
+This feature gives the ability to take snapshot of files.
+
+##Descritpion
+This feature adds file snapshotting support to glusterfs. Snapshots can be created , deleted and reverted.
+
+To take a snapshot of a file, file should be in QCOW2 format as the code for the block layer snapshot has been taken from Qemu and put into gluster as a translator.
+
+With this feature, glusterfs will have better integration with Openstack Cinder, and in general ability to take snapshots of files (typically VM images).
+
+New extended attribute (xattr) will be added to identify files which are 'snapshot managed' vs raw files.
+
+##Volume Options
+Following volume option needs to be set on the volume for taking file snapshot.
+
+    # features.file-snapshot on
+##CLI parameters
+Following cli parameters needs to be passed with setfattr command to create, delete and revert file snapshot.
+
+    # trusted.glusterfs.block-format
+    # trusted.glusterfs.block-snapshot-create
+    # trusted.glusterfs.block-snapshot-goto
+##Fully loaded Example
+Download glusterfs3.5 rpms from download.gluster.org
+Install these rpms.
+
+start glusterd by using the command
+
+    # service glusterd start
+Now create a volume by using the command
+
+    # gluster volume create <vol_name> <brick_path>
+Run the command below to make sure that volume is created.
+
+    # gluster volume info
+Now turn on the snapshot feature on the volume by using the command
+
+    # gluster volume set <vol_name> features.file-snapshot on
+Verify that the option is set by using the command
+
+    # gluster volume info
+User should be able to see another option in the volume info
+
+    # features.file-snapshot: on
+Now mount the volume using fuse mount
+
+    # mount -t glusterfs <vol_name> <mount point>
+cd into the mount point
+    # cd <mount_point>
+    # touch <file_name>
+Size of the file can be set and format of the file can be changed to QCOW2 by running the command below. File size can be in KB/MB/GB
+
+    # setfattr -n trusted.glusterfs.block-format -v qcow2:<file_size> <file_name>
+Now create another file and send data to that file by running the command
+
+    # echo 'ABCDEFGHIJ' > <data_file1>
+copy the data to the one file to another by running the command
+
+    # dd if=data-file1 of=big-file conv=notrunc
+Now take the `snapshot of the file` by running the command
+
+    # setfattr -n trusted.glusterfs.block-snapshot-create -v <image1> <file_name>
+Add some more contents to the file and take another file snaphot by doing the following steps
+
+    # echo '1234567890' > <data_file2>
+    # dd if=<data_file2> of=<file_name> conv=notrunc
+    # setfattr -n trusted.glusterfs.block-snapshot-create -v <image2> <file_name>
+Now `revert` both the file snapshots and write data to some files so that data can be compared.
+
+    # setfattr -n trusted.glusterfs.block-snapshot-goto -v <image1> <file_name>
+    # dd if=<file_name> of=<out-file1> bs=11 count=1
+    # setfattr -n trusted.glusterfs.block-snapshot-goto -v <image2> <file_name>
+    # dd if=<file_name> of=<out-file2> bs=11 count=1
+Now read the contents of the files and compare as below:
+
+    # cat <data_file1>, <out_file1>  and compare contents.
+    # cat <data_file2>, <out_file2>  and compare contents.
+##one line description for the variables used
+file_name = File which will be creating in the mount point intially.
+
+data_file1 = File which contains data 'ABCDEFGHIJ'
+
+image1 = First file snapshot which has 'ABCDEFGHIJ' + some null values.
+
+data_file2 = File which contains data '1234567890'
+
+image2 = second file snapshot which has '1234567890' + some null values.
+
+out_file1 = After reverting image1 this contains 'ABCDEFGHIJ'
+
+out_file2 = After reverting image2 this contians '1234567890'
diff --git a/doc/features/nufa.md b/doc/features/nufa.md
new file mode 100644
index 000000000..03b8194b4
--- /dev/null
+++ b/doc/features/nufa.md
@@ -0,0 +1,20 @@
+# NUFA Translator
+
+The NUFA ("Non Uniform File Access") is a variant of the DHT ("Distributed Hash
+Table") translator, intended for use with workloads that have a high locality
+of reference.  Instead of placing new files pseudo-randomly, it places them on
+the same nodes where they are created so that future accesses can be made
+locally.  For replicated volumes, this means that one copy will be local and
+others will be remote; the read-replica selection mechanisms will then favor
+the local copy for reads.  For non-replicated volumes, the only copy will be
+local.
+
+## Interface
+
+Use of NUFA is controlled by a volume option, as follows.
+
+	gluster volume set myvolume cluster.nufa on
+
+This will cause the NUFA translator to be used wherever the DHT translator
+otherwise would be.  The rest is all automatic.
+
diff --git a/doc/features/server-quorum.md b/doc/features/server-quorum.md
new file mode 100644
index 000000000..7b20084ce
--- /dev/null
+++ b/doc/features/server-quorum.md
@@ -0,0 +1,44 @@
+# Server Quorum
+
+Server quorum is a feature intended to reduce the occurrence of "split brain"
+after a brick failure or network partition.  Split brain happens when different
+sets of servers are allowed to process different sets of writes, leaving data
+in a state that can not be reconciled automatically.  The key to avoiding split
+brain is to ensure that there can be only one set of servers - a quorum - that
+can continue handling writes.  Server quorum does this by the brutal but
+effective means of forcing down all brick daemons on cluster nodes that can no
+longer reach enough of their peers to form a majority.  Because there can only
+be one majority, there can be only one set of bricks remaining, and thus split
+brain can not occur.
+
+## Options
+
+Server quorum is controlled by two parameters:
+
+ * **cluster.server-quorum-type**
+ 
+   This value may be "server" to indicate that server quorum is enabled, or
+   "none" to mean it's disabled.
+	
+ * **cluster.server-quorum-ratio**
+
+   This is the percentage of cluster nodes that must be up to maintain quorum.
+   More precisely, this percentage of nodes *plus one* must be up.
+
+Note that these are cluster-wide flags.  All volumes served by the cluster will
+be affected.  Once these values are set, quorum actions - starting or stopping
+brick daemons in response to node or network events - will be automatic.
+
+## Best Practices
+
+If a cluster with an even number of nodes is split exactly down the middle,
+neither half can have quorum (which requires **more than** half of the total).
+This is particularly important when N=2, in which case the loss of either node
+leads to loss of quorum.  Therefore, it is highly advisable to ensure that the
+cluster size is three or greater.  The "extra" node in this case need not have
+any bricks or serve any data.  It need only be present to preserve the notion
+of a quorum majority less than the entire cluster membership, allowing the
+cluster to survive the loss of a single node without losing quorum.
+
+
+
diff --git a/doc/gluster.8 b/doc/gluster.8
index 3c78fb8b1..1d2a4d097 100644
--- a/doc/gluster.8
+++ b/doc/gluster.8
@@ -71,7 +71,10 @@ If you remove the brick, the data stored in that brick will not be available. Yo
 .B replace-brick
 option.
 .TP
-\fB\ volume rebalance-brick <VOLNAME>(<BRICK> <NEW-BRICK>) start \fR
+\fB\ volume replace-brick <VOLNAME> (<BRICK> <NEW-BRICK>) start|pause|abort|status|commit \fR
+Replace the specified brick.
+.TP
+\fB\ volume rebalance <VOLNAME> start \fR
 Start rebalancing the specified volume.
 .TP
 \fB\ volume rebalance <VOLNAME> stop \fR
@@ -80,8 +83,6 @@ Stop rebalancing the specified volume.
 \fB\ volume rebalance <VOLNAME> status \fR
 Display the rebalance status of the specified volume.
 .TP
-\fB\ volume replace-brick <VOLNAME> (<BRICK> <NEW-BRICK>) start|pause|abort|status|commit \fR
-Replace the specified brick.
 .SS "Log Commands"
 .TP
 \fB\ volume log filename <VOLNAME> [BRICK] <DIRECTORY> \fB
diff --git a/doc/mount.glusterfs.8 b/doc/mount.glusterfs.8
index e6061ffc6..32260ced0 100644
--- a/doc/mount.glusterfs.8
+++ b/doc/mount.glusterfs.8
@@ -62,6 +62,9 @@ Mount the filesystem read-only
 \fBenable\-ino32=\fRBOOL
 Use 32-bit inodes when mounting to workaround broken applications that don't
 support 64-bit inodes
+.TP
+\fBmem\-accounting
+Enable internal memory accounting
 
 .PP
 .SS "Advanced options"
@@ -108,6 +111,22 @@ Provide list of backup volfile servers in the following format [default: None]
 \fB         <server1>:/<volname> <mount_point>
 
 .TP
+.TP
+\fBfetch-attempts=\fRN
+\fBDeprecated\fR option - placed here for backward compatibility [default: 1]
+.TP
+.TP
+\fBbackground-qlen=\fRN
+Set fuse module's background queue length to N [default: 64]
+.TP
+\fBno\-root\-squash=\fRBOOL
+disable root squashing for the trusted client [default: off]
+.TP
+\fBroot\-squash=\fRBOOL
+enable root squashing for the trusted client [default: on]
+.TP
+\fBuse\-readdirp=\fRBOOL
+Use readdirp() mode in fuse kernel module [default: on]
 .PP
 .SH FILES
 .TP
diff --git a/doc/network_compression.md b/doc/network_compression.md
new file mode 100644
index 000000000..7327591ef
--- /dev/null
+++ b/doc/network_compression.md
@@ -0,0 +1,71 @@
+#On-Wire Compression + Decompression
+
+The 'compression translator' compresses and decompresses data in-flight
+between client and bricks.
+
+###Working
+When a writev call occurs, the client compresses the data before sending it to
+brick. On the brick, compressed data is decompressed. Similarly, when a readv
+call occurs, the brick compresses the data before sending it to client. On the
+client, the compressed data is decompressed. Thus, the amount of data sent over
+the wire is minimized. Compression/Decompression is done using Zlib library.
+
+During normal operation, this is the format of data sent over wire:
+
+~~~
+<compressed-data> + trailer(8 bytes)
+~~~
+
+The trailer contains the CRC32 checksum and length of original uncompressed
+data. This is used for validation.
+
+###Usage
+
+Turning on compression xlator:
+
+~~~
+gluster volume set <vol_name> network.compression on
+~~~
+
+###Configurable parameters (optional)
+
+**Compression level**
+~~~
+gluster volume set <vol_name> network.compression.compression-level 8
+~~~
+
+~~~
+0  : no compression
+1  : best speed
+9  : best compression
+-1 : default compression
+~~~
+
+**Minimum file size**
+
+~~~
+gluster volume set <vol_name> network.compression.min-size 50
+~~~
+
+Data is compressed only when its size exceeds the above value in bytes.
+
+**Other paramaters**
+
+Other less frequently used parameters include `network.compression.mem-level`
+and `network.compression.window-size`. More details can about these options
+can be found by running `gluster volume set help` command.
+
+###Known Issues and Limitations
+
+* Compression translator cannot work with striped volumes.
+* Mount point hangs when writing a file with write-behind xlator turned on. To
+overcome this, turn off `performance.write-behind` entirely OR
+set`performance.strict-write-ordering` to on.
+* For glusterfs versions <= 3.5, compression traslator can ONLY work with pure
+distribute volumes. This limitation is caused by AFR not being able to
+propagate xdata. This issue has been fixed in glusterfs versions > 3.5
+
+###TODO
+Although zlib offers high compression ratio, it is very slow. We can make the
+translator pluggable to add support for other compression methods such as
+[lz4 compression](https://code.google.com/p/lz4/)
diff --git a/doc/upgrade/quota-upgrade-steps.md b/doc/upgrade/quota-upgrade-steps.md
new file mode 100644
index 000000000..402fbdf65
--- /dev/null
+++ b/doc/upgrade/quota-upgrade-steps.md
@@ -0,0 +1,79 @@
+Upgrade Steps For Quota
+=======================
+
+The upgrade process for quota involves executing two upgrade scripts:   
+1. pre-upgrade-script-for-quota.sh, and   
+2. post-upgrade-script-for-quota.sh
+
+Pre-Upgrade Script:
+==================
+
+###What it does:
+
+The pre-upgrade script (pre-upgrade-script-for-quota.sh) iterates over the list of volumes that have quota enabled and captures the configured quota limits for each such volume in a file under /var/tmp/glusterfs/quota-config-backup/vol_&lt;VOLNAME&gt; by executing 'quota list' command on each one of them.
+
+###Pre-requisites for running Pre-Upgrade Script:
+
+1. Make sure glusterd and the brick processes are running on all nodes in the cluster.
+2. The pre-upgrade script must be run prior to upgradation.
+3. The pre-upgrade script must be run on only one of the nodes in the cluster.
+
+###Location:
+pre-upgrade-script-for-quota.sh must be retrieved from the source tree under the 'extras' directory.
+
+###Invocation:
+Invoke the script by executing `./pre-upgrade-script-for-quota.sh` from the shell on any one of the nodes in the cluster.
+
+* Example:   
+  <code>
+  [root@server1 extras]#./pre-upgrade-script-for-quota.sh
+  </code>
+
+Post-Upgrade Script:
+===================
+
+###What it does:
+The post-upgrade script (post-upgrade-script-for-quota.sh)  picks the volumes that have quota enabled.
+
+Because the cluster must be operating at op-version 3 for quota to work, the 'default-soft-limit' for each of these volumes is set to 80% (which is its default value) via `volume set` operation as an explicit trigger to bump up the op-version of the cluster and also to trigger a re-write of volfiles which knocks quota off client volume file.
+
+Once this is done, these volumes are started forcefully using `volume start force` to launch the Quota Daemon on all the nodes.
+
+Thereafter, for each of these volumes, the paths and the limits configured on them are retrieved from the backed up file /var/tmp/glusterfs/quota-config-backup/vol_&lt;VOLNAME&gt; and limits are set on them via the `quota limit-usage` interface.
+
+####Note:
+In the new version of quota, the command `quota limit-usage` will fail if the directory on which quota limit is to be set for a given volume does not exist. Therefore, it is advised that you create these directories first before running post-upgrade-script-for-quota.sh if you want limits to be set on these directories.
+
+###Pre-requisites for running Post-Upgrade Script:
+1. The post-upgrade script must be executed after all the nodes in the cluster have upgraded.
+2. Also, all the clients accessing the given volume must also be upgraded before the script is run.
+3. Make sure glusterd and the brick processes are running on all nodes in the cluster post upgrade.
+4. The script must be run from the same node where the pre-upgrade script was run.
+
+
+###Location:
+post-upgrade-script-for-quota.sh can be found under the 'extras' directory of the source tree for glusterfs.
+
+###Invocation:
+post-upgrade-script-for-quota.sh takes one command line argument. This argument could be one of the following:
+1. the name of the volume which has quota enabled; or
+2. 'all'.
+
+In the first case, invoke post-upgrade-script-for-quota.sh from the shell for each volume with quota enabled, with the name of the volume passed as an argument in the command-line:
+
+* Example:   
+  For a volume "vol1" on which quota is enabled, invoke the script in the following way:
+  <code>
+  [root@server1 extras]#./post-upgrade-script-for-quota.sh vol1
+  </code>
+
+In the second case, the post-upgrade script picks on its own, the volumes on which quota is enabled, and executes the post-upgrade procedure on each one of them. In this case, invoke post-upgrade-script-for-quota.sh from the shell with 'all' passed as an argument in the command-line:
+
+* Example:   
+  <code>
+  [root@server1 extras]#./post-upgrade-script-for-quota.sh all
+  </code>
+
+####Note:
+1. In the second case, post-upgrade-script-for-quota.sh exits prematurely upon failure to ugprade any given volume. In that case, you may run post-upgrade-script-for-quota.sh individually (using the volume name as command line argument) on this volume and also on all volumes appearing after this volume in the output of `gluster volume list`, that have quota enabled.
+2. The backed up files under /var/tmp/glusterfs/quota-config-backup/ are retained after the post-upgrade procedure for reference.
author	Jeff Darcy <jdarcy@redhat.com>	2014-04-22 15:37:09 +0000
committer	Jeff Darcy <jdarcy@redhat.com>	2014-04-22 15:37:09 +0000
commit	a827c5eab32a43ade5551259ea56a6a1af7e861b (patch)
tree	e6707df68f72baa8645210ba931272285116ad85 /doc
parent	46d333783a968ab39e0beade9c7a1eec8035f8b1 (diff)
parent	99bfc2a2a1689da1e173cb2f8ef54d2b09ef3a5d (diff)