glusterfs.git/xlators/mgmt/glusterd/src/glusterd-volume-set.c, branch v3.5qa2

cli/glusterd: Changes to quota command Quota feature

2013-11-26T18:25:27+00:00

 re-work.

Following are the cli commands that are new/re-worked:
======================================================

volume quota  {enable|disable|list [ ...]|remove | default-soft-limit } |
volume quota  {limit-usage   []} |
volume quota  {alert-time|soft-timeout|hard-timeout} {}
volume status [all |  [nfs|shd||quotad]] [detail|clients|mem|inode|fd|callpool]
volume statedump  [nfs|quotad] [all|mem|iobuf|callpool|priv|fd|inode|history]

glusterd changes:
=================
* Quota limits are now set as extended attributes by glusterd from
  the aux mount created by the cli.
* The gfids of the directories on which quota limits are set
  for a given volume are stored in
  /var/lib/glusterd/vols//quota.conf file in binary format,
  and whose cksum and version is stored in
  /var/lib/glusterd/vols//quota.cksum.

Original-author: Krutika Dhananjay 
Original-author: Krishnan Parthasarathi 

BUG: 969461
Change-Id: If32bba36c67f9c2a30417af9c6389045b2b7c13b
Signed-off-by: Krutika Dhananjay 
Signed-off-by: Raghavendra G 
Reviewed-on: http://review.gluster.org/6003
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

Transparent data encryption and metadata authentication

2013-11-13T23:12:49+00:00

.. in the systems with non-trusted server

This new functionality can be useful in various cloud technologies.
It is implemented via a special encryption/crypt translator,which
works on the client side and performs encryption and authentication;

              1. Class of supported algorithms

The crypt translator can support any atomic symmetric block cipher
algorithms (which require to pad plain/cipher text before performing
encryption/decryption transform (see glossary in atom.c for
definitions). In particular, it can support algorithms with the EOF
issue (which require to pad the end of file by extra-data).

Crypt translator performs translations
user -> (offset, size) -> (aligned-offset, padded-size) ->server
(and backward), and resolves individual FOPs (write(), truncate(),
etc) to read-modify-write sequences.

A volume can contain files encrypted by different algorithms of the
mentioned class. To change some option value just reconfigure the
volume.

Currently only one algorithm is supported: AES_XTS.

Example of algorithms, which can not be supported by the crypt
translator:

1. Asymmetric block cipher algorithms, which inflate data, e.g. RSA;
2. Symmetric block cipher algorithms with inline MACs for data
   authentication.

                   2. Implementation notes.

a) Atomic algorithms

Since any process in a stackable file system manipulates with local
data (which can be obsoleted by local data of another process), any
atomic cipher algorithm without proper support can lead to non-POSIX
behavior. To resolve the "collisions" we introduce locks: before
performing FOP->read(), FOP->write(), etc. the process should first
lock the file.

b) Algorithms with EOF issue

Such algorithms require to pad the end of file with some extra-data.
Without proper support this will result in losing information about
real file size. Keeping a track of real file size is a responsibility
of the crypt translator. A special extended attribute with the name
"trusted.glusterfs.crypt.att.size" is used for this purpose. All files
contained in bricks of encrypted volume do have "padded" sizes.

                  3. Non-trusted servers and
                     Metadata authentication

We assume that server, where user's data is stored on is non-trusted.
It means that the server can be subjected to various attacks directed
to reveal user's encrypted personal data. We provide protection
against such attacks.

Every encrypted file has specific private attributes (cipher algorithm
id, atom size, etc), which are packed to a string (so-called "format
string") and stored as a special extended attribute with the name
"trusted.glusterfs.crypt.att.cfmt". We protect the string from
tampering. This protection is mandatory, hardcoded and is always on.
Without such protection various attacks (based on extending the scope
of per-file secret keys) are possible.

Our authentication method has been developed in tight collaboration
with Red Hat security team and is implemented as "metadata loader of
version 1" (see file metadata.c). This method is NIST-compliant and is
based on checking 8-byte per-hardlink MACs created(updated) by
FOP->create(), FOP->link(), FOP->unlink(), FOP->rename() by the
following unique entities:

. file (hardlink) name;
. verified file's object id (gfid).

Every time, before manipulating with a file, we check it's MACs at
FOP->open() time. Some FOPs don't require a file to be opened (e.g.
FOP->truncate()). In such cases the crypt translator opens the file
mandatory.

                        4. Generating keys

Unique per-file keys are derived by NIST-compliant methods from the

a) parent key;
b) unique verified object-id of the file (gfid);
Per-volume master key, provided by user at mount time is in the root
of this "tree of keys".

Those keys are used to:

1) encrypt/decrypt file data;
2) encrypt/decrypt file metadata;
3) create per-file and per-link MACs for metadata authentication.

                          5. Instructions
                 Getting started with crypt translator

Example:

1) Create a volume "myvol" and enable encryption:

   # gluster volume create myvol pepelac:/vols/xvol
   # gluster volume set myvol encryption on

2) Set location (absolute pathname) of your master key:

   # gluster volume set myvol encryption.master-key /home/me/mykey

3) Set other options to override default options, if needed.
   Start the volume.

4) On the client side make sure that the file /home/me/mykey exists
   and contains proper per-volume master key (that is 256-bit AES
   key). This key has to be in hex form, i.e. should be represented
   by 64 symbols from the set  {'0', ..., '9', 'a', ..., 'f'}.
   The key should start at the beginning of the file. All symbols at
   offsets >= 64 are ignored.

5) Mount the volume "myvol" on the client side:

   # glusterfs --volfile-server=pepelac --volfile-id=myvol /mnt

   After successful mount the file which contains master key may be
   removed. NOTE: Keeping the master key between mount sessions is in
   user's competence.

**********************************************************************

WARNING! Losing the master key will make content of all regular files
inaccessible. Mount with improper master key allows to access content
of directories: file names are not encrypted.

**********************************************************************

               6. Options of crypt translator

1) "master-key": specifies location (absolute pathname) of the file
   which contains per-volume master key. There is no default location
   for master key.

2) "data-key-size": specifies size of per-file key for data encryption
   Possible values:
   . "256" default value
   . "512"

3) "block-size": specifies atom size. Possible values:
   . "512"
   . "1024"
   . "2048"
   . "4096" default value;

                       7. Test cases

Any workload, which involves the following file operations:

->create();
->open();
->readv();
->writev();
->truncate();
->ftruncate();
->link();
->unlink();
->rename();
->readdirp().

                        8. TODOs:

1) Currently size of IOs issued by crypt translator is restricted
   by block_size (4K by default). We can use larger IOs to improve
   performance.

Change-Id: I2601fe95c5c4dc5b22308a53d0cbdc071d5e5cee
BUG: 1030058
Signed-off-by: Edward Shishkin 
Signed-off-by: Anand Avati 
Reviewed-on: http://review.gluster.org/4667
Tested-by: Gluster Build System

bd: Add aio support to BD xlator

2013-11-13T19:39:11+00:00

Volume option bd-aio controls AIO feature for BD xlator. Code taken from
posix-aio.c

Change-Id: Ib049bd59c9d3f9101d33939838322cfa808de053
BUG: 1028672
Signed-off-by: M. Mohan Kumar 
Reviewed-on: http://review.gluster.org/5748
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

features/compress: Compression/DeCompression translator

2013-11-12T03:35:01+00:00

* When a writev call occurs, the client compresses the data before
  sending it to server. On the server, compressed data is decompressed.
  Similarly, when a readv call occurs, the server compresses the data
  before sending it to client. On the client, the compressed data is
  decompressed. Thus the amount of data sent over the wire is minimized.

* Compression/Decompression is done using Zlib library.

* During normal operation, this is the format of data sent over wire :
   + trailer(8)
  The trailer contains the CRC32 checksum and length of original
  uncompressed data. This is used for validation.

HOW TO USE
----------
Turning on compression xlator:
gluster volume set  compress on

Configurable options:
gluster volume set  compress.compression-level 8
gluster volume set  compress.min-size 50

Change-Id: Ib7a66b6f1f70fe002b7c513588cdf75c69370805
BUG: 923540
Original-author : Venky Shankar 
Signed-off-by: Venky Shankar 
Signed-off-by: Prashanth Pai 
Signed-off-by: Prashanth Pai 
Reviewed-on: http://review.gluster.org/3251
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

rpcsvc: implement per-client RPC throttling

2013-10-28T07:33:19+00:00

Implement a limit on the total number of outstanding RPC requests
from a given cient. Once the limit is reached the client socket
is removed from POLL-IN event polling.

Change-Id: I8071b8c89b78d02e830e6af5a540308199d6bdcd
BUG: 1008301
Signed-off-by: Anand Avati 
Reviewed-on: http://review.gluster.org/6114
Reviewed-by: Santosh Pradhan 
Reviewed-by: Rajesh Joseph 
Reviewed-by: Harshavardhana 
Reviewed-by: Vijay Bellur 
Tested-by: Gluster Build System

gNFS: NFS daemon is limiting IOs to 64KB

2013-09-20T18:29:34+00:00

Problem:
Gluster NFS server is hard-coding the max rsize/wsize to 64KB
which is very less for NFS running over 10GE NIC. The existing
options nfs.read-size, nfs.write-size are not working as
expected.

FIX:
Make the options nfs.read-size (for rsize) and nfs.write-size
(for wsize) work to tune the NFS I/O size. Value range would
be 4KB(Min)-64KB(Default)-1MB(max).

NB: Credit to "Richard Wareing" for catching it.

Change-Id: I2754ecb0975692304308be8bcf496c713355f1c8
BUG: 1009223
Signed-off-by: Santosh Kumar Pradhan 
Reviewed-on: http://review.gluster.org/5964
Tested-by: Gluster Build System 
Reviewed-by: Kaleb KEITHLEY 
Reviewed-by: Anand Avati

features/marker: force xtime updates (configurable) for client-pid = -1

2013-09-05T03:43:03+00:00

This is required by Geo-Replication that does auxillary mount
with client-pid as -1 (which has special treatment at specific
places in GlusterFS), to trigger xtime updates on the intermediate
master in a cascading setup.

Marker too had a check to "not" mark updates for geo-replication's
auxillary mounts. With the new geo-replication design, xtimes are
not set by the master on the slave for all entities. Due to this
cascading setups were broken.

This patch introduces "geo-replication.ignore-pid-check" option
as a "override" for the client-pid check for gsyncd's client-pid.
When this options is enabled, marker start "marking" even if the
updates are from the special client.

Geo-Replication on the detection of itself being an intermediate
master, enables this option.

Change-Id: I9f7140edd12fef5480595ee0f93f35b94cdb8345
BUG: 996371
Signed-off-by: Venky Shankar 
Reviewed-on: http://review.gluster.org/5591
Tested-by: Gluster Build System 
Reviewed-by: Avra Sengupta 
Tested-by: Avra Sengupta 
Reviewed-by: Anand Avati

performance/readdir-ahead: introduce directory read-ahead translator

2013-09-04T16:04:15+00:00

This is a translator to improve the performance of typical,
sequential directory reads (i.e., ls). readdir-ahead begins
preloading the contents of a directory on open and serves readdir
requests from the preloaded content. readdir-ahead is currently
implemented to only handle the single threaded directory read
case.

readdir-ahead is currently disabled by default. It can be enabled
with the following command:

	gluster volume set  readdir-ahead on

The following are results of a getdents test on a single brick
volume.

Test info:

- Single VM, gluster client/server.
- Volume mounted with native client using --gid-timeout=2.
- getdents on single directory with 100k 0-byte files.

Test results:

- !readdir-ahead

read 3120080 bytes from offset 0
3 MiB, 4348 ops, 0:00:07.00 (416.590 KiB/sec and 594.4737 ops/sec)

- readdir-ahead

read 3120080 bytes from offset 0
3 MiB, 4348 ops, 0:00:03.00 (820.116 KiB/sec and 1170.3043 ops/sec)

BUG: 980517
Change-Id: Ieceb9e1eb47d1d5b5af8da2bf03839537364653f
Signed-off-by: Brian Foster 
Reviewed-on: http://review.gluster.org/4519
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

features/qemu-block: support for QCOW2 and QED formats

2013-09-03T18:26:26+00:00

This patch adds support for internals snapshots using QCOW2 and
general framework for external snapshots (next patch) with
QCOW2 and QED.

For internal snapshots, the file must be "initialized" or
"formatted" into QCOW2 format, and specify a file size.

Snapshots can be created, deleted, and applied ("goto").

e.g:

 // Format and Initialize

sh# setfattr -n trusted.glusterfs.block-format -v qcow2:10GB /mnt/imgfile
sh# ls -l /mnt/imgfile
-rw-r--r-- 1 root root 10G Jul 18 21:20 imgfile

 // Create a snapshot

sh# setfattr -n trusted.glusterfs.block-snapshot-create -v name1 imgfile

 // Apply a snapshot

sh# setfattr -n trusted.gluterfs.block-snapshot-goto -v name1 imgfile

Change-Id: If993e057a9455967ba3fa9dcabb7f74b8b2cf4c3
BUG: 986775
Signed-off-by: Anand Avati 
Reviewed-on: http://review.gluster.org/5367
Tested-by: Gluster Build System 
Reviewed-by: Brian Foster

nfs: persistent caching of connected NFS-clients

2013-08-28T13:54:44+00:00

Introduce /var/lib/glusterfs/nfs/rmtab to contain a list of NFS-clients
which have a volume mounted. The volume option 'nfs.mount-rmtab' can be
set to an alternative filename. When the file is located on shared
storage, multiple gNFS servers can use the same file to present a single
NFS-server.

This cache is read when a system administrator calls 'showmount -a' and
updated when an NFS-client calls MNT or UMNT from the MOUNT protocol.

Usage:
- create a volume for storing the shared rmtab file
- mount the volume on all storage servers, at the same location
- make sure that the volume is mounted at boot (add to /etc/fstab)
- place the rmtab file on the volume:
   # gluster volume set  nfs.mount-rmtab /
- any subsequent mount requests will add an entry to this file
- 'showmount -a' requests will return the NFS-clients using the cluster

Note:
The NFS-server does currently not support reconfigure(). When a
configuration option is set/changed, the NFS-server glusterfs process
gets restarted. This causes the active NFS-clients to be forgotten (the
entries are saved in the old rmtab, but we do not have a reference to
that file any more, so we can't re-add them). Therefor a re-mount done
by the NFS-clients is needed before they get listed in the rmtab again.

Change-Id: I58f47135d60ad112849d647bea4e1129683dd2b3
BUG: 904065
Signed-off-by: Niels de Vos 
Reviewed-on: http://review.gluster.org/4430
Tested-by: Gluster Build System 
Reviewed-by: Harshavardhana 
Tested-by: Harshavardhana 
Reviewed-by: Rajesh Joseph