summaryrefslogtreecommitdiffstats
path: root/libglusterfs/src/glusterfs.h
Commit message (Collapse)AuthorAgeFilesLines
* features/shard: Get hard-link-count in {unlink,rename}_cbk before deleting ↵Krutika Dhananjay2016-05-231-0/+1
| | | | | | | | | | | | | | | shards Backport of http://review.gluster.org/#/c/14334/ Change-Id: I41321d8b00a10f1bd5b0a7b008f673b1aa240d0c BUG: 1337837 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14450 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* readdir-ahead: Prefetch xattrs needed by md-cachePrashanth Pai2016-05-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Negative cache feature implementation in md-cache requires xattrs returned by posix to be intercepted for every call that can possibly return xattrs. This includes readdirp(). This is crucial to treat missing keys in cache as a case of negative entry (returns ENODATA) md-cache puts names of xattrs that it wants to cache in xdata and passes it down to posix which returns the specified xattrs in the callback. This is done in lookup() and readdirp(). Hence, a xattr that is cached can be invalidated during readdirp_cbk too. This is based on the assumption that readdirp() will always return all xattrs that md-cache is interested in. However, this is not the case when readdirp() call is served from readdir-ahead's cache. readdir-ahead xlator will pre-fetch dentries during opendir_cbk and readdirp. These internal readdirp() calls made by readdir-ahead xlator does not set xdata in it's requests. Hence, no xattrs are fetched and stored in it's internal cache. This causes metadata loss in gluster-swift. md-cache returns ENODATA during getxattr() call even though the xattr for that object exists on the brick. On receiving ENODATA, gluster-swift will create new metadata and do setxattr(). This results in loss of information stored in existing xattr. Fix: During opendir, md-cache will communicate to readdir-ahead asking it to store the names of xattrs it's interested in so that readdir-ahead can fetch those in all subsequent internal readdirp() calls issued by it. This stored names of xattrs is invalidated/updated on the next real readdirp() call issued by application. This readdirp() call will have xdata set correctly by md-cache xlator. > Reviewed-on: http://review.gluster.org/14214 > Tested-by: Prashanth Pai <ppai@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Smoke: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Tested-by: Raghavendra G <rgowdapp@redhat.com> BUG: 1334700 Change-Id: I32d46f93a99d4ec34c741f3c52b0646d141614f9 (cherry picked from commit 0c73e7050c4d30ace0c39cc9b9634e9c1b448cfb) Reviewed-on: http://review.gluster.org/14282 Tested-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* features/marker: Fix dict_get errors when key is NULLKotresh HR2016-05-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Backport of: >Change-Id: I25e497459441334c13af77b3fec83c42a7a92ac4 >BUG: 1319581 >Signed-off-by: Kotresh HR <khiremat@redhat.com> >Reviewed-on: http://review.gluster.org/13793 >Smoke: Gluster Build System <jenkins@build.gluster.com> >Tested-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> >Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Venky Shankar <vshankar@redhat.com> >Signed-off-by: Kotresh HR <khiremat@redhat.com> Change-Id: I8054ffab3574b6ceb1c7d4290e9f6de3dbf38724 BUG: 1332074 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/14144 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* afr: replica pair going offline does not require CHILD_MODIFIED eventSakshi Bansal2016-04-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | As a part of CHILD_MODIFIED event DHT forgets the current layout and performs fresh lookup. However this is not required when a replica pair goes offline as the xattrs can be read from other replica pairs. Hence setting different event to handle replica pair going down. > Backport of http://review.gluster.org/#/c/12573/ > Change-Id: I5ede2a6398e63f34f89f9d3c9bc30598974402e3 > BUG: 1281230 > Signed-off-by: Sakshi Bansal <sabansal@redhat.com> > Reviewed-on: http://review.gluster.org/12573 > Reviewed-by: Ravishankar N <ravishankar@redhat.com> > Reviewed-by: Susant Palai <spalai@redhat.com> > Tested-by: NetBSD Build System <jenkins@build.gluster.org> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Change-Id: Ida30240d1ad8b8730af7ab50b129dfb05264fdf9 BUG: 1283972 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12767 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* quota: setting 'read-only' option in xdata to instruct DHT to not healSakshi Bansal2016-04-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When quota is enabled the quota enforcer tries to get the size of the source directory by sending nameless lookup to quotad. But if the rename is successful even on one subvol or the source layout has anomalies then this nameless lookup in quotad tries to heal the directory which requires a lock on as many subvols as it can. But src is already locked as part of rename. For rename to proceed in brick it needs to complete a cluster-wide lookup. But cluster-wide lookup in quotad is blocked on locks held by rename, hence a deadlock. To avoid this quota sends an option in xdata which instructs DHT not to heal. Backport of http://review.gluster.org/#/c/13988/ > Change-Id: I792f9322331def0b1f4e16e88deef55d0c9f17f0 > BUG: 1252244 > Signed-off-by: Sakshi Bansal <sabansal@redhat.com> > Reviewed-on: http://review.gluster.org/13988 > Smoke: Gluster Build System <jenkins@build.gluster.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I792f9322331def0b1f4e16e88deef55d0c9f17f0 BUG: 1328473 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/14031 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/distribute: detect stale layouts in entry fopsRaghavendra G2016-04-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dht_mkdir () { first-hashed-subvol = hashed-subvol for "bname" in in-memory layout of "parent"; inodelk (SETLKW, parent, "LAYOUT_HEAL_DOMAIN", "can be any subvol, but we choose first-hashed-subvol randomly"); { begin: hashed-subvol = hashed-subvol for "bname" in in-memory layout of "parent"; hash-range = extract hashe-range from layout of "parent"; ret = mkdir (parent/bname, hashed-subvol, hash-range); if (ret == "hash-value doesn't fall into layout stored on the brick (this error is returned by posix-mkdir)") { refresh_parent_layout (); goto begin; } } inodelk (UNLCK, parent, "LAYOUT_HEAL_DOMAIN", "first-hashed-subvol"); proceed with other parts of dht_mkdir; } posix_mkdir (parent/bname, client-hash-range) { disk-hash-range = getxattr (parent, "dht-layout-key"); if (disk-hash-range != client-hash-range) { fail-with-error ("hash-value doesn't fall into layout stored on the brick"); return 0; } continue-with-posix-mkdir; } Similar changes need to be done for dentry operations like create, symlink, link, unlink, rmdir, rename. These will be addressed in subsequent patches. This patch addresses only mkdir codepath. This change breaks stripe tests, as on some striped subvols dht layout xattrs are not set for some reason. This results in failure of mkdir. Since striped volumes are always created with dht, some tests associated with stripe also fail. So, I am making following tests changes (since stripe is out of maintainance): * modify ./tests/basic/rpc-coverage.t to not to use striped volumes * mark all (2) tests in tests/bugs/stripe/ as bad tests Change-Id: Idd1ae879f24a48303dc743c1bb4d91f89a629e25 BUG: 1329062 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/14040 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* glusterd / afr : Enable auto heal when replica count increasesAnuradha Talur2016-03-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/12451 In replicate volumes, when a brick is added to a replicate group, heal to the new brick should be triggered. Also, the new brick should not be considered as source for healing till it is up to date. Previously, extended attributes had to be set manually on the bricks for this to happen. This patch is part 1 patch to automate this process. >Change-Id: I29958448618372bfde23bf1dac5dd23dba1ad98f >BUG: 1276203 >Signed-off-by: Anuradha Talur <atalur@redhat.com> >Reviewed-on: http://review.gluster.org/12451 >Reviewed-by: Atin Mukherjee <amukherj@redhat.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Ravishankar N <ravishankar@redhat.com> >Smoke: Gluster Build System <jenkins@build.gluster.com> Signed-off-by: Anuradha Talur <atalur@redhat.com> Conflicts: libglusterfs/src/globals.h xlators/mgmt/glusterd/src/glusterd-replace-brick.c Change-Id: Ica83592aab8edbe49e2bb9d8d4824cf5c76324b7 BUG: 1320020 Reviewed-on: http://review.gluster.org/13806 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Anuradha Talur <atalur@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* fuse: Add a new mount option capabilityPoornima G2016-03-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally all security.* xattrs were forbidden if selinux is disabled, which was causing Samba's acl_xattr module to not work, as it would store the NTACL in security.NTACL. To fix this http://review.gluster.org/#/c/12826/ was sent, which forbid only security.selinux. This opened up a getxattr call on security.capability before every write fop and others. Capabilities can be used without selinux, hence if selinux is disabled, security.capability cannot be forbidden. Hence adding a new mount option called capability. Only when "--capability" or "--selinux" mount option is used, security.capability is sent to the brick, else it is forbidden. Backport of : http://review.gluster.org/#/c/13540/ & http://review.gluster.org/#/c/13653/ BUG: 1309462 Change-Id: Ib8d4f32d9f1458f4d71a05785f92b526aa7033ff Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/13626 Tested-by: Vijay Bellur <vbellur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tier:delete the linkfile if data file creation failsMohammed Rafi KC2016-02-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we are creating data file in a hot subvolume then we will create a linkfile in cold subvolume. Linkfile creation happens first. If linkfile creation was successful and data file creation failed, then linkfile in cold subvolume will become stale This patch will delete the linkfile as well, if data file creation fails Also this code duplicates dht_create to make tier_create backport of> >Change-Id: I377a90dad47f288e9576c7323b23cf694a91a7a3 >BUG: 1290677 >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> >Reviewed-on: http://review.gluster.org/12948 >Reviewed-by: N Balachandran <nbalacha@redhat.com> >Tested-by: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> >Reviewed-by: Dan Lambright <dlambrig@redhat.com> >Tested-by: Dan Lambright <dlambrig@redhat.com> Change-Id: I3689e8ee387fb6731b4b3a7fe8f8190143482eaa BUG: 1295359 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/13161 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* tier:unlink during migrationMohammed Rafi KC2016-02-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | files deleted during promotion were not deleting as the files are moving from hashed to non-hashed On deleting a file that is undergoing promotion, the unlink call is not sent to the dst file as the hashed subvol == cached subvol. This causes the file to reappear once the migration is complete. This patch also fixes a problem with stale linkfile deleting. Backport of> >Change-Id: I4b02a498218c9d8eeaa4556fa4219e91e7fa71e5 >BUG: 1282390 >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> >Reviewed-on: http://review.gluster.org/12829 >Tested-by: NetBSD Build System <jenkins@build.gluster.org> >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Dan Lambright <dlambrig@redhat.com> >Tested-by: Dan Lambright <dlambrig@redhat.com> (cherry picked from commit b5de382afa8c5777e455c7a376fc4f1f01d782d1) Change-Id: I951adb4d929926bcd646dd7574f7a2d41d57479d BUG: 1282388 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12991 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* glusterd/afr : Readdirp performance improvementAnuradha Talur2015-12-211-0/+2
| | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/12467/ Add xlator options to index xlator with xattrs that it needs to keep track of. Change-Id: If818673be5e626f77e65cc3a340f8cdd624179c2 BUG: 1287531 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12467 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12848
* glusterd: cli command implementation for bitrot scrub statusGaurav Kumar Garg2015-11-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is backport of: http://review.gluster.org/10231 CLI command for bitrot scrub status will be : gluster volume bitrot <volname> scrub status Above command will show the statistics of bitrot scrubber. Upon execution of this command it will show some common scrubber tunable value of volume <VOLNAME> followed by statistics of scrubber statistics of individual nodes. sample ouput for single node: Volume name : <VOLNAME> State of scrub: Active Scrub frequency: biweekly Bitrot error log location: /var/log/glusterfs/bitd.log Scrubber error log location: /var/log/glusterfs/scrub.log ========================================================= Node name: Number of Scrubbed files: Number of Unsigned files: Last completed scrub time: Duration of last scrub: Error count: ========================================================= This is just infrastructure. list of bad file, last scrub time, error count value will be taken care by http://review.gluster.org/#/c/12503/ and http://review.gluster.org/#/c/12654/ patches. >> Change-Id: I3ed3c7057c9d0c894233f4079a7f185d90c202d1 >> BUG: 1207627 >> Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> >> Reviewed-on: http://review.gluster.org/10231 >> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> >> Tested-by: NetBSD Build System <jenkins@build.gluster.org> >> Tested-by: Gluster Build System <jenkins@build.gluster.com> Change-Id: I45ed94e5e0e78a1e007c30eb0b252f74cf3c9187 BUG: 1283881 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/12704 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* marker: do remove xattr only for last linkvmallika2015-11-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/12033/ With unlink, rename, rmdir, contribution xattrs are removed. If the file is a last link then remove_xattr will fail with ENOENT. So it better to perform remove_xattr only if there are more links to the file > Change-Id: Ifc1e7fda4d310fd87f6f28a635c9ea78b8f3929d > BUG: 1257694 > Signed-off-by: vmallika <vmallika@redhat.com> Change-Id: Icf5fdd86bbb8eef0adeb9518e89e5b612e9e0705 BUG: 1279331 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/12549 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* quota: add version to quota xattrsvmallika2015-11-021-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/12386/ When a quota is disable and the clean-up process terminated without completely cleaning-up the quota xattrs. Now when quota is enabled again, this can mess-up the accounting A version number is suffixed for all quota xattrs and this version number is specific to marker xaltor, i.e when quota xattrs are requested by quotad/client marker will remove the version suffix in the key before sending the response > Change-Id: I1ca2c11460645edba0f6b68db70d476d8d26e1eb > BUG: 1272411 > Signed-off-by: vmallika <vmallika@redhat.com> > Reviewed-on: http://review.gluster.org/12386 > Tested-by: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I67b1b930b28411d76b2d476a4e5250c52aa495a0 BUG: 1277080 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/12487 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* tier/ctr: CTR DB named lookup heal of cold tier during attach tierJoseph Fernandes2015-10-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Heal hardlink in the db for already existing data in the cold tier during attach tier. i.e during fix layout do lookup to files in the cold tier. CTR xlator on the brick/server side does db update/insert of the hardlink on a namelookup. Currently the namedlookup is done synchronous to the fixlayout that is triggered by attach tier. This is not performant, adding more time to fixlayout. The performant approach is record the hardlinks on a compressed datastore and then do the namelookup asynchronously later, giving the ctr db eventual consistency master patch : http://review.gluster.org/#/c/11828/ >>Change-Id: I4ffc337fffe7d447804786851a9183a51b5044a9 >>BUG: 1252586 >>Signed-off-by: Joseph Fernandes <josferna@redhat.com> >>Reviewed-on: http://review.gluster.org/11828 >>Tested-by: Gluster Build System <jenkins@build.gluster.com> >>Reviewed-by: Dan Lambright <dlambrig@redhat.com> >>Tested-by: Dan Lambright <dlambrig@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Change-Id: I61b185a54ae4e8c1d82804b95a278bfbea870987 BUG: 1261146 Reviewed-on: http://review.gluster.org/12331 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* posix: xattrop 'GF_XATTROP_ADD_ARRAY_WITH_DEFAULT' implementationvmallika2015-09-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/11702 Implementation of xattrop type: GF_XATTROP_ADD_ARRAY_WITH_DEFAULT GF_XATTROP_ADD_ARRAY64_WITH_DEFAULT These operations are similar to 'GF_XATTROP_ADD_ARRAY', except that it adds a default value if xattr is missing or its value is zero on disk. One use-case of this operation is in inode-quota. When a new directory is created, its default dir_count should be set to 1. So when a xattrop performed setting inode-xattrs, it should account initial dir_count 1 if the xattrs are not present Here is the usage of this operation value required in xdata for each key struct array { int32_t newvalue_1; int32_t newvalue_2; ... int32_t newvalue_n; int32_t default_1; int32_t default_2; ... int32_t default_n; }; or struct array { int32_t value_1; int32_t value_2; ... int32_t value_n; } data[2]; fill data[0] with new value to add fill data[1] with default value xattrop GF_XATTROP_ADD_ARRAY_WITH_DEFAULT for i from 1 to n { if (xattr (dest_i) is zero or not set in the disk) dest_i = newvalue_i + default_i else dest_i = dest_i + newvalue_i } value in xdata after xattrop is successful struct array { int32_t dest_1; int32_t dest_2; ... int32_t dest_n; }; > Change-Id: Ic6a08473e99fd98299a839d4d8416081a7534efd > BUG: 1243946 > Signed-off-by: vmallika <vmallika@redhat.com> > Reviewed-on: http://review.gluster.org/11702 > Tested-by: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: Ie0c8285d9d582afbc808b0fd878f6c02957ff928 BUG: 1266882 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/12241 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: add "resolve-gids" mount option to overcome 32-groups limitNiels de Vos2015-09-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a --resolve-gids commandline option to the glusterfs binary. This option gets set when executing "mount -t glusterfs -o resolve-gids ...". This option is most useful in combination with the "acl" mount option. POSIX ACL permission checking is done on the FUSE-client side to improve performance (in addition to the checking on the bricks). The fuse-bridge reads /proc/$PID/status by default, and this file contains maximum 32 groups. Any local (client-side) permission checking that requires more than the first 32 groups will fail. By enabling the "resolve-gids" option, the fuse-bridge will call getgrouplist() to retrieve all the groups from the user accessing the mountpoint. This is comparable to how "nfs.server-aux-gids" works. Note that when a user belongs to more than ~93 groups, the volume option server.manage-gids needs to be enabled too. Without this option, the RPC-layer will need to reduce the number of groups to make them fit in the RPC-header. Cherry picked from commit 64a5bf3749c67fcc00773a2716d0c7b61b0b4417: > Change-Id: I7ede90d0e41bcf55755cced5747fa0fb1699edb2 > BUG: 1246275 > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: http://review.gluster.org/11732 > Tested-by: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Ravishankar N <ravishankar@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> > Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Change-Id: I7ede90d0e41bcf55755cced5747fa0fb1699edb2 BUG: 1246397 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11875 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* posix: xattrop 'GF_XATTROP_GET_AND_SET' implementationvmallika2015-08-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/11995 GF_XATTROP_GET_AND_SET stores the existing xattr value in xdata and sets the new value xattrop was reusing input xattr dict to set the results instead of creating new dict. This can be problem for server side xlators as the inout dict will have the value changed. > Change-Id: I43369082e1d0090d211381181e9f3b9075b8e771 > BUG: 1251454 > Signed-off-by: vmallika <vmallika@redhat.com> Change-Id: I7e0c27fd415131e9983a10d27067f63ed3a7701e BUG: 1257441 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/12022 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* features/bit-rot-stub: deny access to bad objectsRaghavendra Bhat2015-07-141-0/+3
| | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/11126 * Access to bad objects (especially operations such as open, readv, writev) should be denied to prevent applications from getting wrong data. * Do not allow anyone apart from scrubber to set bad object xattr. * Do not allow bad object xattr to be removed. Change-Id: I6903184ab64a9d1ea595330b603935979c33bc26 BUG: 1241529 Reviewed-on: http://review.gluster.org/11603 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* glusterd/ afr : set afr pending xattrs on replace brickAnuradha2015-06-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/10076/ This patch is part one change to prevent data loss in a replicate volume on doing a replace-brick commit force operation. Problem: After doing replace-brick commit force, there is a chance that self heal happens from the replaced (sink) brick rather than the source brick leading to data loss. Solution: During the commit phase of replace brick, after old brick is brought down, create a temporary mount and perform setfattr operation (on virtual xattr) indicating AFR to mark the replaced brick as sink. As a part of this change replace-brick command is being changed to use mgmt_v3 framework rather than op-state-machine framework. Many thanks to Krishnan Parthasarathi for helping me out on this. Change-Id: If0d51b5b3cef5b34d5672d46ea12eaa9d35fd894 BUG: 1232173 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/11253 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* storage/posix: Introduce flag instructing posix to perform prestat, writev ↵Krutika Dhananjay2015-06-271-0/+1
| | | | | | | | | | | | | | and poststat atomically Backport of: http://review.gluster.org/11345 Change-Id: I0d7e3a851c744777083974ec4cdb01b08c23727b BUG: 1236271 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11439 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/dht: Fix dht_setxattr to follow files under migrationNithya Balachandran2015-06-041-2/+3
| | | | | | | | | | | | | | | | If a file is under migration, then any xattrs created on it are lost post migration of the file. This is because the xattrs are set only on the cached subvol of the source and as the source is under migration, it becomes a linkto file post migration. Change-Id: Ib8e233b519cf954e7723c6e26b38fa8f9b8c85c0 BUG: 1225839 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10968 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/bitrot: Follow xattr naming conventionsVenky Shankar2015-05-101-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | Instead of "trusted.glusterfs.bit-rot.*" use "trusted.bit-rot.*" NOTE: With this patch, data on existing volumes would be resigned (which should be OK as of now since we do not expect many users as of now :-)) > Change-Id: I926c7bca266a9c8f2cb35d57c4d0359aa5cecfa0 > BUG: 1170075 > Signed-off-by: Venky Shankar <vshankar@redhat.com> > Reviewed-on: http://review.gluster.org/10181 > Tested-by: NetBSD Build System > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I3c18d7dc2db4beaca6e8d8d231b4171a7b18795f Signed-off-by: Venky Shankar <vshankar@redhat.com> BUG: 1220041 Reviewed-on: http://review.gluster.org/10718 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
* core: Global timer-wheelVenky Shankar2015-05-101-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instantiate a process wide global instance of the timer wheel data structure. Spawning glusterfs* process with option arg "--global-timer-wheel" instantiates a global instance of timer-wheel under global context (->ctx). Translators can make use of this process wide instance [via a call to glusterfs_global_timer_wheel()] instead of maintaining an instance of their own and possibly consuming more memory. Linux kernel too has a single instance of timer wheel where subsystems such as IO, networking, etc.. make use of. Bitrot daemon would be early consumers of this: bitrot translator instances for multiple volumes would track objects belonging to their respective bricks in this global expiry tracking data structure. This is also a first step to move GlusterFS timer mechanism to use timer-wheel. > Change-Id: Ie882df607e07acaced846ea269ebf1ece306d6ae > BUG: 1170075 > Signed-off-by: Venky Shankar <vshankar@redhat.com> > Reviewed-on: http://review.gluster.org/10380 > Tested-by: NetBSD Build System > Reviewed-by: Vijay Bellur <vbellur@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> Change-Id: I35c840daa9996a059699f8ea5af54c76ede7e09c Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> BUG: 1220041 Reviewed-on: http://review.gluster.org/10716 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* dht: make lookup-unhashed=auto do something actually usefulJeff Darcy2015-05-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The key concept here is to determine whether a directory is "clean" by comparing its last-known-good topology to the current one for the volume. These are stored as "commit hashes" on the directory and the volume root respectively. The volume's commit hash changes whenever a brick is added or removed, and a fix-layout is done. A directory's commit hash changes only when a full rebalance (not just fix-layout) is done on it. If all bricks are present and have a directory commit hash that matches the volume commit hash, then we can assume that every file is in its "proper" place. Therefore, if we look for a file in that proper place and don't find it, we can assume it's not on any other subvolume and *safely* skip the global (broadcast to all) lookup. Change-Id: Id6ce4593ba1f7daffa74cfab591cb45960629ae3 BUG: 1220064 Reviewed-on-master: http://review.gluster.org/#/c/7702/ Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/10729 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr : Prevent inode-evict during split-brain resolutionAnuradha2015-05-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/10134/ 1) Provided setfattr command to set timeout for split-brain choice. 2) If split-brain inspection/resolution is being done from the mount for a file, ref the inode when split-brain-choice is set. This inode will be unconditionally unref-ed after timeout seconds set by the user/default otherwise. 3) Updated the doc and testcase to reflect the changes. Change-Id: I15c9037dee28855f21e680e7e3632e1f48dba4e1 BUG: 1219388 Reviewed-on: http://review.gluster.org/10134 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10679
* geo-rep: rename handling in dht volumeNithya Balachandran2015-05-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Background: Glusterfs changelogs are stored in each brick, which records the changes happened in that brick. Georep will run in all the nodes of master and processes changelogs "independently". Processing changelogs is in brick level, but all the fops will be replayed on "slave mount" point. Problem: With a DHT volume, in changelog "internal fops" are NOT recorded. For Rename case, Rename is recorded in "hashed" brick changelog. (DHT's internal fops like creating linkto file, unlink is NOT recorded). This lead us to inconsistent rename operations. For example, Distribute volume created with Two bricks B1, B2. //Consider master volume mounted @ /mnt/master and following operations executed: cd /mnt/master touch f1 // f1 falls on B1 Hash mv f1 f2 // f2 falls on B2 Hash // Here, Changelogs are recorded as below: @B1 CREATE f1 @B2 RENAME f1 f2 Here, race exists between Brick B1 and B2, say B2 will get executed first. Source file f1 itself is "NOT PRESENT", so it will go ahead and create f2 (Current implementation). We have this problem When rename falls in another brick and file is unlinked in Master. Similar kind of issue exists in following case too(multiple rename): CREATE f1 RENAME f1 f2 RENAME f2 f1 Solution: Instead of carrying out "changelogging" at "HASHED volume", carry out at the "CACHED volume". This way we have rename operations carried out where actual files are present. So,Changelog recorded as : @B1 CREATE f1 RENAME f1 f2 credit: sarumuga@redhat.com PS: Some of the races as the one below are _NOT_ fixed by this patch * f1 and f2 exist. B1 and B2 are their respective cached subvols. For both files hashed-subvol == cached-subvol * mv f1 f2 on master. * B1 has change-log entry of rename f1 f2 * rebalance migrates f2 from B1 and B2 * mv f2 f1 on master. * B2 has change-log entry of rename f2 f1 Since changelog entries (rename f1 f2) and (rename f2 f1) are processed independently by gsyncds, which of either f1 and f2 survives on slave is subject to race. Note that on master its file f1 with name f1 which survived. On slave it can be either file f1 with name f1 or file f2 with name f2 based on who wins the race of processing changelog. BUG: 1219412 Change-Id: I43725d69635e2ce065135691ef629014e8df7d50 Original-Author: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10410 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/10628 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Upcall: Send stat as part of cache_invalidation notificationsSoumya Koduri2015-05-071-18/+0
| | | | | | | | | | | | | | | | | Have added support to send attributes of both entries and its parent (include oldparent in case of RENAME fop) in the same notification request to avoid multiple rpc requests. Also, made changes in gfapi to send parent object and its attributes changed in a single upcall event. Change-Id: I92833da3bcec38d65216921c2ce4d10367c32ef1 BUG: 1217711 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/10568 Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Upcall: Process each of the upcall events separatelySoumya Koduri2015-05-071-3/+13
| | | | | | | | | | | | | | | | | | As suggested during the code-review of Bug1200262, have modified GF_CBK_UPCALL to be exlusively GF_CBK_CACHE_INVALIDATION. Thus, for any new upcall event, a new CBK procedure will be added. Also made changes to store upcall data separately based on the upcall event type received. BUG: 1217711 Change-Id: I0f5e53d6f5ece16aecb514a0a426dca40fa1c755 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/10049 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/10562 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com>
* rebalance: Introducing local crawl and parallel migrationSusant Palai2015-05-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current patch address two part of the design proposed. 1. Rebalance multiple files in parallel 2. Crawl only bricks that belong to the current node Brief design explanation for the above two points. 1. Rebalance multiple files in parallel: ------------------------------------- The existing rebalance engine is single threaded. Hence, introduced multiple threads which will be running parallel to the crawler. The current rebalance migration is converted to a "Producer-Consumer" frame work. Where Producer is : Crawler Consumer is : Migrating Threads Crawler: Crawler is the main thread. The job of the crawler is now limited to fix-layout of each directory and add the files which are eligible for the migration to a global queue in a round robin manner so that we will use all the disk resources efficiently. Hence, the crawler will not be "blocked" by migration process. Producer: Producer will monitor the global queue. If any file is added to this queue, it will dqueue that entry and migrate the file. Currently 20 migration threads are spawned at the beginning of the rebalance process. Hence, multiple file migration happens in parallel. 2. Crawl only bricks that belong to the current node: -------------------------------------------------- As rebalance process is spawned per node, it migrates only the files that belongs to it's own node for the sake of load balancing. But it also reads entries from the whole cluster, which is not necessary as readdir hits other nodes. New Design: As part of the new design the rebalancer decides the subvols that are local to the rebalancer node by checking the node-uuid of root directory prior to the crawler starts. Hence, readdir won't hit the whole cluster as it has already the context of local subvols and also node-uuid request for each file can be avoided. This makes the rebalance process "more scalable". Change-Id: I6f1b44086a09df8ca23935fd213509c70cc0c050 BUG: 1217381 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/10466 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: N Balachandran <nbalacha@redhat.com>
* glusterd: remove replace brick with data migration support form cli/glusterdGaurav Kumar Garg2015-05-071-1/+0
| | | | | | | | | | | | | | | | | Replace-brick operation with data migration support have been deprecated from gluster. With this fix replace brick command will support only one commad gluster volume replace-brick <VOLNAME> <SOURCE-BRICK> <NEW-BRICK> {commit force} Change-Id: Ib81d49e5d8e7eaa4ccb5830cfec2bc081191b43b BUG: 1218602 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10577 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaushal M <kaushal@redhat.com>
* bitrot/scrub: Scrubber fixesVenky Shankar2015-04-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a handful of problem with scrubber which are detailed below. Scrubber used to skip objects for verification due to missing fd iterface to fetch versioning extended attributes. Similar to the inode interface, an fd based interface in POSIX is now introduced. Moreover, this patch also fixes potential false reporting by scrubber due to: An object gets dirtied and signed when scrubber is busy calculatingobject checksum. This is fixed by caching the signed version when an object is first inspected for stalenes, i.e., during pre-compute stage. This version is used to verify checksum in the post-compute stage when the signatures are compared for possible corruption. Side effect of _not_ sending signature length during signing resulted in "truncated" signature to be set for an object. Now, at the time of signing, the signature length is sent and is used in place of invoking strlen() to get signature length (which could have possible 00s). The signature length itself is not persisted in the signature xattr, but is calculated on-the-fly by substracting the xattr length by the "structure" header size. Some of the log entries are made more meaningful (as and aid for debugging). Change-Id: I938bee5aea6688d5d99eb2640053613af86d6269 BUG: 1207624 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10118 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/ec: Implement heal info for ecPranith Kumar K2015-03-301-1/+1
| | | | | | | | | | | | This also lists the files that are on-going I/O, which will be fixed later. Change-Id: Ib3f60a8b7e8798d068658cf38eaef2a904f9e327 BUG: 1203581 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10020 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* Bitrot StubVenky Shankar2015-03-241-0/+17
| | | | | | | | | | | | | Bitrot stub implements object versioning required for identifying signature freshness. More details about versioning is explained as a part of the "bitrot feature documentation" patch. Change-Id: I2ad70d9eb109ba4a12148ab8d81336afda529ad9 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9709 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr : enable inspection & resolution of files in split-brainAnuradha2015-03-191-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part 2/2 patch to enable users analyze and resolve split-brain. This patch enables : 1) Users to inspect the files in data and metadata split-brain. 2) Resolve the split-brain. Both using a series of setfattr commands. Consider a volume "test" with 2 bricks. 1) To inspect a file f1: setfattr -n replica.split-brain-choice -v test-client-0 f1 After the execution of this command, if no read_subvol is found, reads will be served from test-client-0 (corresponding to brick-0). 2) To resolve split-brain : setfattr -n replica.split-brain-heal-finalize -v test-client-0 f1 Execution of this command will lead to the resolution of data and metadata split-brain with subvol mentioned in the command (test-client-0 here) as the source and the rest as sink. Change-Id: Ia20f3ee5abd3119e3d54fcc599f1e55ac65fd179 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9743 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Adding ChangeTimeRecorder(CTR) Xlator to GlusterFSJoseph Fernandes2015-03-191-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ********************************************************************** ChangeTimeRecorder(CTR) Xlator | ********************************************************************** ChangeTimeRecorder(CTR) is server side xlator(translator) which sits just above posix xlator. The main role of this xlator is to record the access/write patterns on a file residing the brick. It records the read(only data) and write(data and metadata) times and also count on how many times a file is read or written. This xlator also captures the hard links to a file(as its required by data tiering to move files). CTR Xlator is the consumer of libgfdb. To Enable/Disable CTR Xlator: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ gluster volume set <volume-name> features.ctr-enabled {on/off} To Enable/Disable Frequency Counter Recording in CTR Xlator: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ gluster volume set <volume-name> features.record-counters {on/off} Change-Id: I5d3cf056af61ac8e3f8250321a27cb240a214ac2 BUG: 1194753 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/9935 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterfsd: add "print-netgroups" and "print-exports" commandNiels de Vos2015-03-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | NFS now has the ability to use a separate file for "netgroups" and "exports". An administrator should have the ability to check the validity of the files before applying the configuration. The "glusterfsd" command now has the following additional arguments that can be used to check the configuration: --print-netgroups: Validate the netgroups file and print it out --print-exports: Validate the exports file and print it out BUG: 1143880 Change-Id: I24c40d50110d49d8290f9fd916742f7e4d0df85f URL: http://www.gluster.org/community/documentation/index.php/Features/Exports_Netgroups_Authentication Original-author: Shreyas Siravara <shreyas.siravara@gmail.com> CC: Richard Wareing <rwareing@fb.com> CC: Jiffin Tony Thottan <jthottan@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9365 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* features/quota : Introducing inode quotavmallika2015-03-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ========================================================================== Inode quota ========================================================================== = Currently, the only way to retrieve the number of files/objects in a = = directory or volume is to do a crawl of the entire directory/volume. = = This is expensive and is not scalable. = = = = The proposed mechanism will provide an easier alternative to determine = = the count of files/objects in a directory or volume. = = = = The new mechanism proposes to store count of objects/files as part of = = an extended attribute of a directory. Each directory's extended = = attribute value will indicate the number of files/objects present = = in a tree with the directory being considered as the root of the tree. = = = = The count value can be accessed by performing a getxattr(). = = Cluster translators like afr, dht and stripe will perform aggregation = = of count values from various bricks when getxattr() happens on the key = = associated with file/object count. = A new interface is introduced: ------------------------------ limit-objects : limit the number of inodes at directory level list-objects : list the directories where the limit is set remove-objects : remove the limit from the directory ========================================================================== CLI COMMAND: gluster volume quota <volname> limit-objects <path> <number> [<percent>] * <number> is a hard-limit for number of objects limitation for path "<path>" If hard-limit is exceeded, creation of file/directory is no longer permitted. * <percent> is a soft-limit for number of objects creation for path "<path>" If soft-limit is exceeded, a warning is issued for each creation. CLI COMMAND: gluster volume quota <volname> remove-objects [path] ========================================================================== CLI COMMAND: gluster volume quota <volname> list-objects [path] ... Sample output: ------------------ Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------ -------------------------------------- /dir 10 80% 10 0 Yes Yes ========================================================================== [root@snapshot-28 dir]# ls a b file11 file12 file13 file14 file15 file16 file17 [root@snapshot-28 dir]# touch a1 touch: cannot touch `a1': Disk quota exceeded * Nine files are created in directory "dir" and directory is included in * the count too. Hence the limit "10" is reached and further file creation fails ========================================================================== Note: We have also done some re-factoring in cli for volume name validation. New function cli_validate_volname is created ========================================================================== Change-Id: I1823497de4f790a2a20ebb1770293472ea33ee2b BUG: 1190108 Signed-off-by: Sachin Pandit <spandit@redhat.com> Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/9769 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Change the subvolume encoding in d_off to be a "global"Dan Lambright2015-03-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | position in the graph rather than relative (local) to a particular translator. Encoding the volume in this way allows a single translator to manage which brick is currently being scanned for directory entries. Using a single translator minimizes allocated bits in the d_off. It also allows multiple DHT translators in the same graph to have a common frame of reference (the graph position) for which brick is being read. Multiple DHT translators are needed for the Tiering feature. The fix builds off a previous change (9332) which removed subvolume encoding from AFR. The fix makes an equivalent change to the EC translator. More background can be found in fix 9332 and gluster-dev discussions [1]. DHT and AFR/EC are responsibile (as before) for choosing which brick to enumerate directory entries in over the readdir lifecycle. The client translator receiving the readdir fop encodes the dht_t. It is referred to as the "leaf node" in the graph and corresponds to the brick being scanned. When DHT decodes the d_off, it translates the leaf node to a local subvolume, which represents the next node in the graph leading to the brick. Tracking of leaf nodes is done in common utility functions. Leaf nodes counts and positional information are updated on a graph switch. [1] www.gluster.org/pipermail/gluster-devel/2015-January/043592.html Change-Id: Iaf0ea86d7046b1ceadbad69d88707b243077ebc8 BUG: 1190734 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9688 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* Quota/marker : Support for inode quotavmallika2015-03-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the only way to retrieve the number of files/objects in a directory or volume is to do a crawl of the entire directory/volume. This is expensive and is not scalable. The new mechanism proposes to store count of objects/files as part of an extended attribute of a directory. Each directory's extended attribute value will indicate the number of files/objects present in a tree with the directory being considered as the root of the tree. Currently file usage is accounted in marker by doing multiple FOPs like setting and getting xattrs. Doing this with STACK WIND and UNWIND can be harder to debug as involves multiple callbacks. In this code we are replacing current mechanism with syncop approach as syncop code is much simpler to follow and help us implement inode quota in an organized way. Change-Id: Ibf366fbe07037284e89a241ddaff7750fc8771b4 BUG: 1188636 Signed-off-by: vmallika <vmallika@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/9567 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* every/where: add GF_FOP_IPC for inter-translator communicationJeff Darcy2015-03-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several features - e.g. encryption, erasure codes, or NSR - involve multiple cooperating translators which sometimes need a "private" means of communication amongst themselves. Historically we've used virtual or synthetic xattrs, but that's not very elegant and clutters up the getxattr/setxattr path which must also handle real xattr requests. This new fop should address that. The only argument is an int32_t "op" which should be recognized by the target translator. It is recommended that translators using these feature follow some convention regarding the ops that they define, to avoid conflicts. Using a hash of the target translator's type string as a base for a series of ops would probably be a good start. Any other information can be passed in both directions using xdata. The default behavior for this fop, as with any other, is to pass through to FIRST_CHILD. That makes use of this fop "transparent" to other translators that were written before it existed, but it also means that it only really works with pass-through translators. If a routing translator (such as DHT) or a fan-out translator (such as AFR) is involved, the IPC might not reach its intended destination unless those translators are modified to forward IPC fops along all paths. If an IPC gets all the way to storage/posix it is considered an error, much like an uncaught exception. We don't actually *do* anything in that case, but we do log it send back an EOPNOTSUPP error. This makes the "unrecognized opcode" condition distinguishable from the "no IPC support" condition (which would yield an RPC error instead) so clients can probe for the presence of a handler for their own favorite opcode and either use that or use old-school xattrs depending on the result. BUG: 1158628 Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Change-Id: I84af1b17babe5b30ec03ecf027ae37d09b873968 Reviewed-on: http://review.gluster.org/8812 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Upcall: New xlator to store various states and send cbk eventsSoumya Koduri2015-03-171-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Framework on the server-side, to handle certain state of the files accessed and send notifications to the clients connected. A generic and extensible framework, used to maintain states in the glusterfsd process for each of the files accessed (including the clients info doing the fops) and send notifications to the respective glusterfs clients incase of any change in that state. This patch handles "Inode Update/Invalidation" upcall event. Feature page: URL: http://www.gluster.org/community/documentation/index.php/Features/Upcall-infrastructure Below link has a writeup which explains the code changes done - URL: https://soumyakoduri.wordpress.com/2015/02/25/glusterfs-understanding-upcall-infrastructure-and-cache-invalidation-support/ Change-Id: Ie3d724be9a3419fcf18901a753e8ec2df2ac802f BUG: 1200262 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/9535 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* libglusterfs: Add functions for xlator and graph cleanup.Poornima G2015-03-021-0/+2
| | | | | | | | | | Change-Id: If341e3c0a559aa5bbca9c1263a241c6592c59706 BUG: 1093594 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/9696 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr : provide split-brain info by using getxattrAnuradha2015-02-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | This patch is one part to enable users analyze and resolve split-brain. Problem : To know if a file is in data/metadata split-brain Solution : Performing "getfattr -n afr.split-brain-status <path-to-file>" from the mount provides this information. Also provides the list of afr children to analyse to get more information. Change-Id: I4d9b429794759a906371416cb84c84a212e2c7b9 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9633 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* protocol/client: sequence CHILD_UP, CHILD_DOWN etc notificationsKrishnan Parthasarathi2015-02-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... from all bricks in the volume This patch is important in the context of MT epoll. With MT epoll, notification events from client xlators could reach cluster xlators like afr, dht, ec, stripe etc. in different orders. For e.g, In a distributed replicate volume of 2 bricks, namely Brick1 and Brick2, the following network events are observed by a mount process. - connection to Brick1 is broken. - connection to Brick1 has been restored. - connection to Brick2 is broken. - connection to Brick2 has been restored. Without establishing a total ordering of events, we can't guarantee that cluster xlators like afr, dht perceive them in the same order. While we would expect afr (say) to perceive it as only one of Brick1 and Brick2 going down at any given time, it is possible for the notification of Brick2 going offline to race with the notification of Brick1 coming back online. Change-Id: I78f5a52bfb05593335d0e9ad53ebfff98995593d BUG: 1104462 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/9591 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* epoll: Adding the ability to configure epoll threadsShyam2015-02-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the ability to configure the number of event threads for various gluster services. Currently with the multi thread epoll patch, it is possible to have more than one thread waiting on socket activity and processing the same. This thread count is currently static, which this commit makes dynamic. The current services which use IO path, i.e brick processes, any client process (nfs, FUSE, gfapi, heal, rebalance, etc.a), gain 2 set parameters to control the number of threads that are processing events. These settings are, - client.event-threads <n> - server.event-threads <n> The client setting affects the client graph consumers, and the server setting affects the brick processes. These are processed and inited/reconfigured using the client/server protocol xlators. Other services (say glusterd) would need to extend similar configuration settings to take advantage of multi threaded event processing. At present glusterd is not enabled with this commit, as it does not stand to gain from this multi-threading (as I understand it). Change-Id: Id8422fc57a9f95a135158eb6477ccf9d3c9ea4d9 BUG: 1104462 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/9488 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* uss: disable memory accounting for the snapshot daemonRaghavendra Bhat2015-01-281-0/+1
| | | | | | | | | | | | | | | | | | * Bring in option to disable memory accounting for a glusterfs process This reverses the changes done by the commit 7fba3a88f1ced610eca0c23516a1e720d75160cd. * Change the key from "memory-accounting" to "no-memory-accounting", as by default all the glusterfs process enable memory accounting now. So to disable memory accounting for some process, "no-mem-accounting" argument has to be passed. Change-Id: I39c7cefb0fe764ea3e48f4e73e1305b084c5f497 BUG: 1184366 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/9469 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: split-brain resolution CLIRavishankar N2015-01-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the AFR heal command to include automated split-brain resolution. This patch [3/3] is the final patch for afr automated split-brain resolution implementation. "gluster volume heal <VOLNAME> [full | statistics [heal-count [replica <HOSTNAME:BRICKNAME>]] |info [healed | heal-failed | split-brain]| split-brain {bigger-file <FILE> |source-brick <HOSTNAME:BRICKNAME> [<FILE>]}]" The new additions being: 1.gluster volume heal <VOLNAME> split-brain bigger-file <FILE> Locates the replica containing the FILE, selects bigger-file as source and completes heal. 2.gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> <FILE> Selects <FILE> present in <HOSTNAME:BRICKNAME> as source and completes heal. 3.gluster volume heal <VOLNAME> split-brain <HOSTNAME:BRICKNAME> Selects all split-brained files in <HOSTNAME:BRICKNAME> as source and completes heal. Note: <FILE> can be either the full file name as seen from the root of the volume (or) the gfid-string representation of the file, which sometimes gets displayed in the heal info command's output. Entry/gfid split-brain resolution is not supported. Example can be found in the test case. Change-Id: I4649733922d406f14f28ee9033a5cb627b9538b3 BUG: 1136769 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9377 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* feature/changelog: Virtual xattr to trigger explicit sync in geo-rep.Kotresh HR2014-12-291-0/+1
| | | | | | | | | | | | | | | | | | | A virtual xattr "glusterfs.geo-rep.trigger-sync" is provided in glusterfs through changelog translator. Geo-rep triggers a explicit data sync on setting this xattr on a file. Changelog captures a DATA entry on file's gfid on setting this virtual xattr on a file. This is supported only for files. It doesn't support directories. Usage: setfattr -n glusterfs.geo-rep.trigger-sync <file-path> Change-Id: Ia689326ac2dcb31035ffbecad2c548eda4eb9245 BUG: 1176934 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/9337 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* cluster/afr : Change in volume heal info commandAnuradha2014-12-231-0/+2
| | | | | | | | | | | | | | | | gluster volume heal <volname> info command will now also display if the files listed (in the output of the command) are in split-brain or possibly being healed. This patch also fixes build warning that occurs. Change-Id: I1fc92e62137f23b2b9ddf6e05819cee6230741d1 BUG: 1163804 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9119 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>