summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* cluster/afr : enable inspection & resolution of files in split-brainAnuradha2015-03-198-34/+431
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part 2/2 patch to enable users analyze and resolve split-brain. This patch enables : 1) Users to inspect the files in data and metadata split-brain. 2) Resolve the split-brain. Both using a series of setfattr commands. Consider a volume "test" with 2 bricks. 1) To inspect a file f1: setfattr -n replica.split-brain-choice -v test-client-0 f1 After the execution of this command, if no read_subvol is found, reads will be served from test-client-0 (corresponding to brick-0). 2) To resolve split-brain : setfattr -n replica.split-brain-heal-finalize -v test-client-0 f1 Execution of this command will lead to the resolution of data and metadata split-brain with subvol mentioned in the command (test-client-0 here) as the source and the rest as sink. Change-Id: Ia20f3ee5abd3119e3d54fcc599f1e55ac65fd179 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9743 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: CLI commands to create and manage tiered volumes.Dan Lambright2015-03-1915-43/+821
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A tiered volume is a normal volume with some number of new bricks representing "hot" storage. The "hot" bricks can be attached or detached dynamically to a normal volume. When this happens, a new graph is constructed. The root of the new graph is an instance of the tier translator. One subvolume of the tier translator leads to the old volume, and another leads to the new hot bricks. attach-tier <VOLNAME> [<replica> <COUNT>] <NEW-BRICK> ... [force] volume detach-tier <VOLNAME> [replica <COUNT>] <BRICK> ... <start|stop|status|commit|force> gluster volume rebalance <volume> tier start gluster volume rebalance <volume> tier stop gluster volume rebalance <volume> tier status The "tier start" CLI command starts a server side daemon. The daemon initiates file level migration based on caching policies. The daemon's status can be monitored and stopped. Note development on the "tier status" command is incomplete. It will be added in a subsequent patch. When the "hot" storage is detached, the tier translator is removed from the graph and the tiered volume reverts to its original state as described in the volume's info file. For more background and design see the feature page [1]. [1] http://www.gluster.org/community/documentation/index.php/Features/data-classification Change-Id: Ic8042ce37327b850b9e199236e5be3dae95d2472 BUG: 1194753 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9753 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* doc: heal info and split-brain resolution documentAnuradha2015-03-191-0/+448
| | | | | | | | | | | | | Documentation for gluster volume heal info and split-brain resolution. Change-Id: Iacb30dff939c1acf06c24bd8844294d4a90ab38d BUG: 1154599 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9595 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* gfapi: space/tab conversion in gfapi.mapNiels de Vos2015-03-191-5/+5
| | | | | | | | | | | | | The last two patches incorrectly add symbols indented by spaces. The rest of the file uses tabs. This change corrects these occurences. Change-Id: Ibfb057b78c1203a594bfeb73a2955e798e86c8e1 BUG: 1185654 Reported-by: Kaleb S. Keithley <kkeithle@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9937 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* Adding ChangeTimeRecorder(CTR) Xlator to GlusterFSJoseph Fernandes2015-03-1914-12/+2112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ********************************************************************** ChangeTimeRecorder(CTR) Xlator | ********************************************************************** ChangeTimeRecorder(CTR) is server side xlator(translator) which sits just above posix xlator. The main role of this xlator is to record the access/write patterns on a file residing the brick. It records the read(only data) and write(data and metadata) times and also count on how many times a file is read or written. This xlator also captures the hard links to a file(as its required by data tiering to move files). CTR Xlator is the consumer of libgfdb. To Enable/Disable CTR Xlator: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ gluster volume set <volume-name> features.ctr-enabled {on/off} To Enable/Disable Frequency Counter Recording in CTR Xlator: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ gluster volume set <volume-name> features.record-counters {on/off} Change-Id: I5d3cf056af61ac8e3f8250321a27cb240a214ac2 BUG: 1194753 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/9935 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterfsd: add "print-netgroups" and "print-exports" commandNiels de Vos2015-03-1816-14/+385
| | | | | | | | | | | | | | | | | | | | | | | NFS now has the ability to use a separate file for "netgroups" and "exports". An administrator should have the ability to check the validity of the files before applying the configuration. The "glusterfsd" command now has the following additional arguments that can be used to check the configuration: --print-netgroups: Validate the netgroups file and print it out --print-exports: Validate the exports file and print it out BUG: 1143880 Change-Id: I24c40d50110d49d8290f9fd916742f7e4d0df85f URL: http://www.gluster.org/community/documentation/index.php/Features/Exports_Netgroups_Authentication Original-author: Shreyas Siravara <shreyas.siravara@gmail.com> CC: Richard Wareing <rwareing@fb.com> CC: Jiffin Tony Thottan <jthottan@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9365 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* contrib/timer-wheel: import linux kernel timer-wheelVenky Shankar2015-03-185-2/+444
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch imports timer-wheel[1] algorithm from the linux kernel (~/kernel/time/timer.c) with some modifications. Timer-wheel is an efficent way to track millions of timers for expiry. This is a variant of the simple but RAM heavy approach of having a list (timer bucket) for every future second. Timer-wheel categorizes every future second into a logarithmic array of arrays. This is done by splitting the 32 bit "timeout" value into fixed "sliced" bits, thereby each category has a fixed size array to which buckets are assigned. A classic split would be 8+6+6+6 (used in this patch) which results in 256+64+64+64 == 512 buckets. Therefore, the entire 32 bit futuristic timeouts have been mapped into 512 buckets. [ NOTE: There are other possible splits, such as "8+8+8+8", but this patch sticks to the widely used and tested default. ] Therfore, the first category "holds" timers whose expiry range is between 1..256, the next cateogry holds 257..16384, third category 16385..1048576 and so on. When timers are added, unless it's in the first category, timers with different timeouts could end up in the same bucket. This means that the timers are "partially sorted" -- sorted in their highest bits. The expiry code walks the first array of buckets and exprires any pending timers (1..256). Next, at time value 257, timers in the first bucket of the second array is "cascaded" onto the first category and timers are placed into respective buckets according to the thier timeout values. Cascading "brings down" the timers timeout to the coorect bucket of their respective category. Therefore, timers are sorted by their highest bits of the timeout value and then by the lower bits too. [1] https://lwn.net/Articles/152436/ Change-Id: I1219abf69290961ae9a3d483e11c107c5f49c4e3 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9707 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Ignore .trashcan during xsync crawl.Kotresh HR2015-03-181-0/+1
| | | | | | | | | | | | | | | | With trash feature, .trashcan directory gets created at each export directory. Xsync picks .trashcan to sync and fails with EPERM. Xsync should ignore .trashcan directory. Change-Id: I45bd226c96011ace2c40dd2de878d886c7d34ce5 BUG: 1203293 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/9934 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* libglusterfs/rot-buffs: rotational buffersVenky Shankar2015-03-186-14/+661
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces rotational buffers aiming at the classic multiple producer and multiple consumer problem. A fixed set of buffer list is allocated during initialization, where each list consist of a list of buffers. Each buffer is an iovec pointing to a memory region of fixed allocation size. Multiple producers write data to these buffers. A buffer list starts with a single buffer (iovec) and allocates more when required (although this can be preallocatd in multiples of k). rot-buffs allow multiple producers to write data parallely with a bit of extra cost of taking locks. Therefore, it's much suited for large writes. Multiple producers are allowed to write in the buffer parallely by "reserving" write space for selected number of bytes and returning pointer to the start of the reserved area. The write size is selected by the producer before it starts the write (which is often known). Therefore, the write itself need not be serialized -- just the space reservation needs to be done safely. The other part is when a consumer kicks in to consume what has been produced. At this point, a buffer list switch is performed. The "current" buffer list pointer is safely pointed to the next available buffer list. New writes are now directed to the just switched buffer list (the old buffer list is now considered out of rotation). Note that the old buffer still may have producers in progress (pending writes), so the consumer has to wait till the writers are drained. Currently this is the slow path for producers (write completion) and needs to be improved. Currently, there is special handling for cases where the number of consumers match (or exceed) the number of producers, which could result in writer starvation. In this scenario, when a consumers requests a buffer list for consumption, a check is performed for writer starvation and consumption is denied until at least another buffer list is ready of the producer for writes, i.e., one (or more) consumer(s) completed, thereby putting the buffer list back in rotation. [ NOTE: I've not performance tested this producer-consumer model yet. It's being used in changelog for event notification. The list of buffers (iovecs) are directly passed to RPC layer. ] Change-Id: I88d235522b05ab82509aba861374a2312bff57f2 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9706 Tested-by: Vijay Bellur <vbellur@redhat.com>
* cli/glusterd: cli command implementation for bitrot featuresGaurav Kumar Garg2015-03-1817-3/+845
| | | | | | | | | | | | | | | | | CLI command for bitrot features. volume bitrot <volname> enable|disable Above command will enable/disable bitrot feature for particular volume. BUG: 1170075 Change-Id: Ie84002ef7f479a285688fdae99c7afa3e91b8b99 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Signed-off-by: Anand nekkunti <anekkunt@redhat.com> Signed-off-by: Dominic P Geevarghese <dgeevarg@redhat.com> Reviewed-on: http://review.gluster.org/9866 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* snapshot/scheduling: A cron based scheduler for snapshot schedulingAvra Sengupta2015-03-187-1/+811
| | | | | | | | | | | | | | | | | | | | | | | | | | GlusterFS volume snapshot provides point-in-time copy of a GlusterFS volume. Currently, GlusterFS volume snapshots can be easily scheduled by setting up cron jobs on one of the nodes in the GlusterFS trusted storage pool. This has a single point failure (SPOF), as scheduled jobs can be missed if the node running the cron jobs dies. The solution to the above problems is addressed in this patch. The snap_scheduler.py helper script expects the user to install the argparse python module before using it. Further details for the same are available at: http://www.gluster.org/community/documentation/index.php/Features/Scheduling_of_Snapshot Change-Id: I2c357af5b7d3e66f270d20eef50cdeecdcbe15c7 BUG: 1198027 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/9788 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/quota : Introducing inode quotavmallika2015-03-1825-684/+1385
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ========================================================================== Inode quota ========================================================================== = Currently, the only way to retrieve the number of files/objects in a = = directory or volume is to do a crawl of the entire directory/volume. = = This is expensive and is not scalable. = = = = The proposed mechanism will provide an easier alternative to determine = = the count of files/objects in a directory or volume. = = = = The new mechanism proposes to store count of objects/files as part of = = an extended attribute of a directory. Each directory's extended = = attribute value will indicate the number of files/objects present = = in a tree with the directory being considered as the root of the tree. = = = = The count value can be accessed by performing a getxattr(). = = Cluster translators like afr, dht and stripe will perform aggregation = = of count values from various bricks when getxattr() happens on the key = = associated with file/object count. = A new interface is introduced: ------------------------------ limit-objects : limit the number of inodes at directory level list-objects : list the directories where the limit is set remove-objects : remove the limit from the directory ========================================================================== CLI COMMAND: gluster volume quota <volname> limit-objects <path> <number> [<percent>] * <number> is a hard-limit for number of objects limitation for path "<path>" If hard-limit is exceeded, creation of file/directory is no longer permitted. * <percent> is a soft-limit for number of objects creation for path "<path>" If soft-limit is exceeded, a warning is issued for each creation. CLI COMMAND: gluster volume quota <volname> remove-objects [path] ========================================================================== CLI COMMAND: gluster volume quota <volname> list-objects [path] ... Sample output: ------------------ Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------ -------------------------------------- /dir 10 80% 10 0 Yes Yes ========================================================================== [root@snapshot-28 dir]# ls a b file11 file12 file13 file14 file15 file16 file17 [root@snapshot-28 dir]# touch a1 touch: cannot touch `a1': Disk quota exceeded * Nine files are created in directory "dir" and directory is included in * the count too. Hence the limit "10" is reached and further file creation fails ========================================================================== Note: We have also done some re-factoring in cli for volume name validation. New function cli_validate_volname is created ========================================================================== Change-Id: I1823497de4f790a2a20ebb1770293472ea33ee2b BUG: 1190108 Signed-off-by: Sachin Pandit <spandit@redhat.com> Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/9769 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep / gsyncd: use RTLD_GLOBAL while loading libgfchangelogVenky Shankar2015-03-181-2/+2
| | | | | | | | | | | | | | | | | | With the RPC based changes to {libgf}changelog, loading shared objects dynamically would need symbols to be available from other shared libraries. As an example, creating an RPC listner loads the RPC transport shared object which requires symbols to be available from already loaded shared objects. Using RTLD_GLOBAL makes the symbols available for symbol resolution of subsequently loaded libraries. Change-Id: I3d3ef790eded82911f05836c707509157680645c BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9814 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/changelog: RPC'fy {libgf}changelogVenky Shankar2015-03-1831-1302/+3971
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces RPC based communication between the changelog translator and libgfchangelog. It replaces the old pathetic stream based interaction that existed earlier (due to time constraints :-/). Changelog, upon initialization starts a RPC server (rpcsvc) allowing clients to invoke a probe API as a bootup mechanism to request for event notifications. During probe, clients can choose an event filter specifying the type(s) of events they are interested in. As of now there is no way to change the event notification set once the probe RPC call is made, but that is easier to implement. The actual event notifications is done on a separate RPC session. The client (libgfchangelog) itself starts and RPC server which the changelog translator "connects back" during probe. Notifications are dispatched by a bunch of threads from the server (translator) and the client optionally orders them if ordered notifications are requried. FOPs fill in their respective event details in a buffer (rot-buffs to be particular) and a bunch of threads (consumers) swap the buffers out of roatation and dispatch them via RPC. To avoid writer starvation, then number of dispatcher threads is one less than the number of buffer list in rot-buffs.x libgfchangelog becomes purely callback based -- upon event notification from the server (and re-ordering them if required) invoke a callback routine specified by consumer(s). A major part of the patch is also aimed at providing backward compatibility for geo-replication, which was one of the main consumer of the stream based API. Also, this patch does not\ "turn on" event notifications for all fops, just a bunch which is currently in requirement. Another pain point is that the server does not filter events before dispatching it to the clients. That load is taken up by the client itself (although it's done at the library layer rather than making it hard on the callback implementor). This needs improvement and care needs to be taken to not load the server up with expensive filtering mechanisms. Change-Id: Ibf60a432b68f2dfa60c6f9add2bcfd37a9c41395 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9708 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gfapi: add glfs_h_acl_set() and glfs_h_acl_get()Niels de Vos2015-03-189-16/+143
| | | | | | | | | | | | | | | | | | | | | These two functions add support for POSIX ACLs through the GFAPI-handle interface. The initial infrastructure for POSIX ACLs based on libacl has been added with the required changes to the POSIX xlator: - http://review.gluster.org/9627 NetBSD does not support POSIX ACLs, so using any of the functions should return ENOTSUP. URL: http://www.gluster.org/community/documentation/index.php/Features/Improved_POSIX_ACLs Change-Id: Ie74f3f963c3f9d576cb2f2a1e6d97e3cd4b01eda BUG: 1185654 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9736 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Adding Libgfdb to GlusterFSJoseph Fernandes2015-03-1816-11/+4305
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ************************************************************************* Libgfdb | ************************************************************************* Libgfdb provides abstract mechanism to record extra/rich metadata required for data maintenance, such as data tiering/classification. It provides consumer with API for recording and querying, keeping the consumer abstracted from the data store used beneath for storing data. It works in a plug-and-play model, where data stores can be plugged-in. Presently we have plugin for Sqlite3. In the future will provide recording and querying performance optimizer. In the current implementation the schema of metadata is fixed. Schema: ~~~~~~ GF_FILE_TB Table: ~~~~~~~~~~~~~~~~~ This table has one entry per file inode. It holds the metadata required to make decisions in data maintenance. GF_ID (Primary key) : File GFID (Universal Unique IDentifier in the namespace) W_SEC, W_MSEC : Write wind time in sec & micro-sec UW_SEC, UW_MSEC : Write un-wind time in sec & micro-sec W_READ_SEC, W_READ_MSEC : Read wind time in sec & micro-sec UW_READ_SEC, UW_READ_MSEC : Read un-wind time in sec & micro-sec WRITE_FREQ_CNTR INTEGER : Write Frequency Counter READ_FREQ_CNTR INTEGER : Read Frequency Counter GF_FLINK_TABLE: ~~~~~~~~~~~~~~ This table has all the hardlinks to a file inode. GF_ID : File GFID (Composite Primary Key)``| GF_PID : Parent Directory GFID (Composite Primary Key) |-> Primary Key FNAME : File Base Name (Composite Primary Key)__| FPATH : File Full Path (Its redundant for now, this will go) W_DEL_FLAG : This Flag is used for crash consistancy, when a link is unlinked. i.e Set to 1 during unlink wind and during unwind this record is deleted LINK_UPDATE : This Flag is used when a link is changed i.e rename. Set to 1 when rename wind and set to 0 in rename unwind Libgfdb API: ~~~~~~~~~~~ Refer libglusterfs/src/gfdb/gfdb_data_store.h Change-Id: I2e9fbab3878ce630a7f41221ef61017dc43db11f BUG: 1194753 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/9683 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* marker: fix compile time warning on buf arg.Humble Devassy Chirammal2015-03-181-2/+0
| | | | | | | | | | | | | | | | | Problem: marker-quota.c: In function 'mq_inspect_directory_xattr_task': marker-quota.c:3451:31: warning: variable 'buf' set but not used [-Wunused-but-set-variable] struct iatt buf = {0,}; Change-Id: I211378328bdb2509a5d2a186d173f7f30a670c8a BUG: 1198849 Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/9928 Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Quota: Build ancestry in the lookupvmallika2015-03-183-10/+113
| | | | | | | | | | | | | | | | | | Marker can fail or can account incorrect numbers when it doesn't find a ancestry for a inode. Solution: Current build_ancestry is done only on demand in the write/create FOPs in quota enforcer. It is good to do this in the quota_lookup as well. Change-Id: I8aaf5b3e05a3ca51e7ab1eaa1b636a90f659a872 BUG: 1184885 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/9478 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Change the subvolume encoding in d_off to be a "global"Dan Lambright2015-03-1820-183/+482
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | position in the graph rather than relative (local) to a particular translator. Encoding the volume in this way allows a single translator to manage which brick is currently being scanned for directory entries. Using a single translator minimizes allocated bits in the d_off. It also allows multiple DHT translators in the same graph to have a common frame of reference (the graph position) for which brick is being read. Multiple DHT translators are needed for the Tiering feature. The fix builds off a previous change (9332) which removed subvolume encoding from AFR. The fix makes an equivalent change to the EC translator. More background can be found in fix 9332 and gluster-dev discussions [1]. DHT and AFR/EC are responsibile (as before) for choosing which brick to enumerate directory entries in over the readdir lifecycle. The client translator receiving the readdir fop encodes the dht_t. It is referred to as the "leaf node" in the graph and corresponds to the brick being scanned. When DHT decodes the d_off, it translates the leaf node to a local subvolume, which represents the next node in the graph leading to the brick. Tracking of leaf nodes is done in common utility functions. Leaf nodes counts and positional information are updated on a graph switch. [1] www.gluster.org/pipermail/gluster-devel/2015-January/043592.html Change-Id: Iaf0ea86d7046b1ceadbad69d88707b243077ebc8 BUG: 1190734 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9688 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* CLI : GLobal option for NFS-GaneshaMeghana Madhusudhan2015-03-1811-18/+502
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new global CLI option has been introduced for NFS-Ganesha. gluster features.ganesha enable/disable. This option is persistent and shall be inherited by new volumes created after this option is set. gluster features.ganesha enable It carries out the following functions: 1. Disables gluster-nfs across the cluster 2. Starts NFS-Ganesha server on a subset of nodes and exports '/'. 3. Creates the HA cluster for NFS-Ganesha. 4. Writes the option into the global config file. gluster features.ganesha disable 1. Stops NFS-Ganesha server. 2. Tears down the HA cluster for NFS-Ganesha With this change the older volume set options with keys "nfs-ganesha.host" and "nfs-ganesha.enable" will no longer be supported. This commit has only has the CLI related changes. Another patch will be submitted to support this feature entirely. Change-Id: Ie4b66a16c23b33b795738654b9a68f8e2c34efe3 BUG: 1188184 Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com> Reviewed-on: http://review.gluster.org/9538 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* dht: suggest to add more bricks when min-free-disk is exceeded.Humble Devassy Chirammal2015-03-182-3/+3
| | | | | | | | | | Change-Id: I628fbd99c2478fcb8bb6e5be55e43467f25227bf BUG: 1165870 Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/9879 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Lalatendu Mohanty <lmohanty@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* Tests: do not hardcode NFS state directoryEmmanuel Dreyfus2015-03-181-8/+13
| | | | | | | | | | | | | | | | | | | | tests/basic/mount-nfs-auth.t hardcoded /var/lib/glusterd/nfs/ as the NFS state directory, cuasing failures if glusterfs was configured with state in another location. Fix this by obtaning the directory through a gluster volume get command. The nfs.mount-rmtab key gives us a file inside the directory we are looking for. This fixes tests/basic/mount-nfs-auth.t regression on NetBSD. BUG: 1129939 Change-Id: I19184859c03faf5b9aeb95d080cf90fa581be380 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/9896 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* doc: correct component name in manage volumes.Humble Devassy Chirammal2015-03-181-2/+2
| | | | | | | | | Change-Id: I12cb0cdacace755d6c545c8697a4e18d6035954b BUG: 1075417 Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/9869 Reviewed-by: Lalatendu Mohanty <lmohanty@redhat.com> Tested-by: Lalatendu Mohanty <lmohanty@redhat.com>
* Tests: ls portability regarding dot-filesEmmanuel Dreyfus2015-03-182-3/+3
| | | | | | | | | | | | | | | | | | | When run as root, BSD ls(1) lists dot-files, which includes .glusterfs in split-brain-healing.t's usage. This leads to failure. gfid-self-heal.t suffers the same problem. Fix by filtering out dot-files in ls(1) output NB: split-brain-healing.t also requires http://review.gluster.org/9831 to pass on NetBSD. BUG: 1129939 Change-Id: Ic572d3abf685e9b43f32ddee8a13b5f5c4ae641f Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/9885 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* glusterd: add new NFS options for exports/netgroups and related cachingNiels de Vos2015-03-183-6/+248
| | | | | | | | | | | | | | | | | The following options for the Gluster/NFS server are added : - nfs.exports-auth-enable - nfs.auth-refresh-interval-sec - nfs.auth-cache-ttl-sec BUG: 1143880 Change-Id: I37a73966c4ed27cd0f8c77200ef68a0d12b385b8 Original-author: Shreyas Siravara <shreyas.siravara@gmail.com> CC: Richard Wareing <rwareing@fb.com> CC: Jiffin Tony Thottan <jthottan@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9364 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* doc: document glusterfind featureAravinda VK2015-03-181-0/+148
| | | | | | | | | | Change-Id: I759f05d63028d6a52c3e1ee2ab9892583c4832e7 BUG: 1193893 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/9800 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Remove compilation warningKaushal M2015-03-182-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | In glusterd_peerinfo_destroy, cast the passed 'strcut rcu_head *' pointer to 'gd_rcu_head *' before use in caa_container_of() to prevent the incompatible-pointer compilation warning. Also, refactor peerinfo->head to peerinfo->rcu_head to reduce confusion when reading code. This change was developed on the git branch at [1]. This commit is a combination of the following commits on the development branch. aa4a0bc Rename peerinfo->head to peerinfo->rcu_head c79144b Cast struct rcu_head * to gd_rcu_head * to prevent warning 1d222c3 More head -> rcu_head renames [1]: https://github.com/kshlm/glusterfs/tree/urcu BUG: 1191030 Change-Id: I7ede02090413839563ce44fdf6289697b28777e7 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/9922 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* geo-rep: Remove Generated file From gitAravinda VK2015-03-181-177/+0
| | | | | | | | | | | geo-replication/src/peer_mountbroker BUG: 1136312 Change-Id: Ib9b287b4e1183cb44acbf01184a240be7f09be7c Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/9923 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* feature/glusterfind: A tool to find incremental changesAravinda VK2015-03-1821-20/+1341
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Documentation is available in patch: http://review.gluster.org/#/c/9800/ A tool which helps to get list of modified files or list of all files in GlusterFS Volume using Changelog or find command. Usage ===== glusterfind --help Create: ------- glusterfind create --help The tool creates status file $GLUSTERD_WORKDIR/SESSION/VOLUME/status and records current timestamp to initiate the session. This timestamp will be used as start time for next runs. As part of create also generates ssh key and distributes to all peers. and enables build.pgfid and changelog using volume set command. Pre: ---- glusterfind pre --help This command is used to generate the list of files modified after session creation time or after last run. To get list of all files/dirs in Volume, run pre command with `--full` argument. The tool gets all nodes details using gluster volume info and runs node agent for each brick in respective nodes via ssh command. Once these node agents generate the output file, tool copies to local using scp. Merges all the output files to generate the final output file. Post: ----- glusterfind post --help After consuming the list, this sub command is called to update the session time based on pre command status file. List: ----- glusterfind list --help To view all the sessions Delete: ------- glusterfind delete --help Delete session. Known Issues ------------ 1. Deleted files will not get listed, since we can't convert GFID to Path if file/dir is deleted. 2. Only new name will get listed if Renamed. 3. All hardlinks will get listed. Change-Id: I82991feb0aea85cb6ec035fddbf80a2b276e86b0 BUG: 1193893 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/9682 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Make read child match check in afr optionalKrutika Dhananjay2015-03-184-0/+25
| | | | | | | | | | | | | | | Having this particular check which was introduced by commit c78998c39f0857ea7aacba360632c148afc54a55 causes a drop in performance in readdirp. So the behavior is made configurable with this patch. Change-Id: I2858fc18b3539df7aa6d3f489e0d5cfaeb8a9b3c BUG: 1202669 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9917 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* NFS-Ganesha: Volume set option for managing NFS-Ganesha exports.Meghana Madhusudhan2015-03-1812-42/+543
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | A dummy translator has been introduced as a place holder for functions related to managing NFS-Ganesha exports. A volume set option is introduced to manage volume level exports. gluster vol set <volname> ganesha.enable ON/OFF 1. gluster volume set <volname> ganesha.enable ON It creates the export config file with a unique export ID. Sends a DBus signal to export this volume dynamically. 2. gluster vol set <volname> ganesha.enable OFF Unexports the specific volume. Deletes the specfic config file related to the volume. This change also removes the handling of the older keys "nfs-ganesha.enable" and "nfs-ganesha.host" Change-Id: I8d4a0b542326a6a0c8e4711600b106274d666587 BUG: 1188184 Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com> Reviewed-on: http://review.gluster.org/9585 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* Snapshot/clone: clone of a snapshot that will act as a regular volumeMohammed Rafi KC2015-03-187-189/+1074
| | | | | | | | | | | | | | | | | snapshot clone will allow us to take a snpahot of a snapshot. Newly created clone volume will be a regular volume with read/write permissions. CLI command snapshot clone <clonename> <snapname> Change-Id: Icadb993fa42fff787a330f8f49452da54e9db7de BUG: 1199894 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/9750 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libgfapi, timer: Fix a crash seen in timer when glfs_fini was invoked.Poornima G2015-03-172-8/+21
| | | | | | | | | | | | | | | | | | | | | | The crash is seen when, glfs_init failed for some reason and glfs_fini was called for cleaning up the partial initialization. The fix is in two folds: 1. In timer store and restore the THIS, previously it was being overwritten. 2. In glfs_free_from_ctx() and glfs_fini() check for NULL before destroying. Change-Id: If40bf69936b873a1da8e348c9d92c66f2f07994b BUG: 1202290 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/9895 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/trash: Avoid unnecessary logging from trash_local_wipeAnoop C S2015-03-171-1/+2
| | | | | | | | | | | | | | | | | | | Even when trash translator is disabled, the following error is being logged for each unlink/truncate/ftruncate calls. [...] E [trash.c:221:trash_local_wipe] (--> ... ... ) 0-trash: invalid argument: local This change replaces GF_VALIDATE_OR_GOTO macro with simple if condition. Change-Id: I7e6754cd53ec7c2d84669b6d40d883a2d1eee41e BUG: 1132465 Signed-off-by: Anoop C S <achiraya@redhat.com> Reviewed-on: http://review.gluster.org/9909 Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* Quota/marker : Support for inode quotavmallika2015-03-1712-269/+1788
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the only way to retrieve the number of files/objects in a directory or volume is to do a crawl of the entire directory/volume. This is expensive and is not scalable. The new mechanism proposes to store count of objects/files as part of an extended attribute of a directory. Each directory's extended attribute value will indicate the number of files/objects present in a tree with the directory being considered as the root of the tree. Currently file usage is accounted in marker by doing multiple FOPs like setting and getting xattrs. Doing this with STACK WIND and UNWIND can be harder to debug as involves multiple callbacks. In this code we are replacing current mechanism with syncop approach as syncop code is much simpler to follow and help us implement inode quota in an organized way. Change-Id: Ibf366fbe07037284e89a241ddaff7750fc8771b4 BUG: 1188636 Signed-off-by: vmallika <vmallika@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/9567 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* tests: Fix spurious failure in bug-866459.tRavishankar N2015-03-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | 10.TEST kill_brick $V0 $H0 $B0/${V0}1 11.-EXPECT '1' echo `pgrep glusterfsd | wc -l Problem: On my Fedora 21 laptop, #11 always fails:"not ok 11 Got "2" instead of "1" On debugging, I found that after killing, the kernel takes some time to clean up the process until which it appears as defunct in the pgrep output: root 21795 2.0 0.0 0 0 ? Zsl 11:57 0:00 [glusterfsd] <defunct> Fix: As long as TEST kill_brick is successful, we really don't need to double check with the pgrep output. Hence removing that line. Change-Id: Ia10e0a04803e54a074f73da6523fa6a98c677d58 BUG: 1163543 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9904 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gfapi: APIs to store and process upcall notifications receivedSoumya Koduri2015-03-1714-4/+654
| | | | | | | | | | | | | | | | | | | | | | | In case of any upcall cbk events received by the protocol/client, gfapi will be notified which queues them up in a list (<gfapi_cbk_upcall>). Applicatons are responsible to provide APIs to process & notify them in case of any such upcall events queued. Added a new API which will be used by Ganesha to repeatedly poll for any such upcall event notified (<glfs_h_poll_upcall>). A new test-file has been added to test the cache_invalidation upcall events. Below link has a writeup which explains the code changes done - URL: https://soumyakoduri.wordpress.com/2015/02/25/glusterfs-understanding-upcall-infrastructure-and-cache-invalidation-support/ Change-Id: Iafc6880000c865fd4da22d0cfc388ec135b5a1c5 BUG: 1200262 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/9536 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* NFS-Ganesha: Install scripts, config files, and resource agent scriptsKaleb S. KEITHLEY2015-03-1714-2/+1388
| | | | | | | | | | | | | | | | | Resubmitting after a gerrit bug bungled the merge of http://review.gluster.org/9621 (was it really a gerrit bug?) Scripts related to NFS-Ganesha are in extras/ganesha/scripts. Config files are in extras/ganesha/config. Resource Agent files are in extras/ganesha/ocf Files are copied to appropriate locations. Change-Id: I137169f4d653ee2b7d6df14d41e2babd0ae8d10c BUG: 1188184 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/9912 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* afr: remove stale index entriesRavishankar N2015-03-1710-57/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: During pre-op phase, the index xlator 1. Creates the entry inside .glusterfs/indices/xattrop 2. Winds the xattrop fop to posix to mark dirty/pending changelogs. If the brick crashes after 1, the xattrop entry becomes stale and never gets removed by shd during subsequent crawls because there is nothing to heal (changelogs are zero). Though the stale entry does not get displayed in the output of 'heal info' command, it nevertheless stays there forever unless a new write transaction is performed on the file. Fix: During index self-heal if afr xattrs are found to be clean (indicated by ret value of 2 on a call to afr_shd_selfheal(), send a dummy post-op with all 0s for the xattr values, which makes the index xlator to unlink the stale entry. Change-Id: I02cb2bc937f2e3f3f3cb35d67b006664dc7ef919 BUG: 1190069 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9714 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* every/where: add GF_FOP_IPC for inter-translator communicationJeff Darcy2015-03-1724-52/+552
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several features - e.g. encryption, erasure codes, or NSR - involve multiple cooperating translators which sometimes need a "private" means of communication amongst themselves. Historically we've used virtual or synthetic xattrs, but that's not very elegant and clutters up the getxattr/setxattr path which must also handle real xattr requests. This new fop should address that. The only argument is an int32_t "op" which should be recognized by the target translator. It is recommended that translators using these feature follow some convention regarding the ops that they define, to avoid conflicts. Using a hash of the target translator's type string as a base for a series of ops would probably be a good start. Any other information can be passed in both directions using xdata. The default behavior for this fop, as with any other, is to pass through to FIRST_CHILD. That makes use of this fop "transparent" to other translators that were written before it existed, but it also means that it only really works with pass-through translators. If a routing translator (such as DHT) or a fan-out translator (such as AFR) is involved, the IPC might not reach its intended destination unless those translators are modified to forward IPC fops along all paths. If an IPC gets all the way to storage/posix it is considered an error, much like an uncaught exception. We don't actually *do* anything in that case, but we do log it send back an EOPNOTSUPP error. This makes the "unrecognized opcode" condition distinguishable from the "no IPC support" condition (which would yield an RPC error instead) so clients can probe for the presence of a handler for their own favorite opcode and either use that or use old-school xattrs depending on the result. BUG: 1158628 Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Change-Id: I84af1b17babe5b30ec03ecf027ae37d09b873968 Reviewed-on: http://review.gluster.org/8812 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* socket: use TCP_USER_TIMEOUT to detect client failures quickerNiels de Vos2015-03-177-12/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the network.ping-timeout to set the TCP_USER_TIMEOUT socket option (see 'man 7 tcp'). The option sets the transport.tcp-user-timeout option that is handled in the rpc/socket layer on the protocol/server side. This socket option makes detecting unclean disconnected clients more reliable. When the socket gets closed, any locks that the client held are been released. This makes it possible to reduce the fail-over time for applications that run on systems that became unreachable due to a network partition or general system error client-side (kernel panic, hang, ...). It is not trivial to create a test-case for this at the moment. We need a client that unclean disconnects and an other client that tries to take over the lock from the disconnected client. URL: http://supercolony.gluster.org/pipermail/gluster-devel/2014-May/040755.html Change-Id: I5e5f540a49abfb5f398291f1818583a63a5f4bb4 BUG: 1129787 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/8065 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Santosh Pradhan <santosh.pradhan@gmail.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* Upcall: New xlator to store various states and send cbk eventsSoumya Koduri2015-03-1719-7/+1845
| | | | | | | | | | | | | | | | | | | | | | | | | | Framework on the server-side, to handle certain state of the files accessed and send notifications to the clients connected. A generic and extensible framework, used to maintain states in the glusterfsd process for each of the files accessed (including the clients info doing the fops) and send notifications to the respective glusterfs clients incase of any change in that state. This patch handles "Inode Update/Invalidation" upcall event. Feature page: URL: http://www.gluster.org/community/documentation/index.php/Features/Upcall-infrastructure Below link has a writeup which explains the code changes done - URL: https://soumyakoduri.wordpress.com/2015/02/25/glusterfs-understanding-upcall-infrastructure-and-cache-invalidation-support/ Change-Id: Ie3d724be9a3419fcf18901a753e8ec2df2ac802f BUG: 1200262 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/9535 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* geo-rep: mountbroker user managementAravinda VK2015-03-176-1/+371
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Non root geo-replication setup is now simplified. This patch provides cli for mountbroker user and options management To set Options, gluster system:: execute mountbroker opt <KEY> <VALUE> # for example, gluster system:: execute mountbroker opt mountbroker-root /var/mountbroker-root gluster system:: execute mountbroker opt geo-replication-log-group geogroup gluster system:: execute mountbroker opt rpc-auth-allow-insecure on To remove option, gluster system:: execute mountbroker optdel <KEY> # for example, gluster system:: execute mountbroker optdel geo-replication-log-group To add/edit user, gluster system:: execute mountbroker user <USERNAME> <VOLUMES> # for example gluster system:: execute mountbroker user geoaccount slavevol1,slavevol2 To remove user, gluster system:: execute mountbroker userdel <USERNAME> # for example gluster system:: execute mountbroker userdel geoaccount For info, gluster system:: execute mountbroker info gluster system:: execute mountbroker -j info For JSON output add -j after mountbroker, for example, gluster system:: execute mountbroker -j user geoaccount slavevol1,slavevol2 PS: Each peer prints its own JSON output, aggregator required from consumer side BUG: 1136312 Change-Id: Ie52210c0bcc91ac2ffd3ba58988222ffca62b47f Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/9398 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: darshan n <dnarayan@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* inode: 'this' has been set unwantedly.Humble Devassy Chirammal2015-03-171-5/+2
| | | | | | | | | | | | | | | | | Problem: CC libglusterfs_la-inode.lo inode.c: In function 'inode_table_destroy': inode.c:1630:19: warning: variable 'this' set but not used [-Wunused-but-set-variable] xlator_t *this = NULL; Change-Id: If4b37ab896ee0a309826d4be48c6599d6ec2710b Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/9846 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anoop C S <achiraya@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Poornima G <pgurusid@redhat.com>
* remove spurious failure test from bug-1087198.tvmallika2015-03-171-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Consider below scenario in the quota xlator T1 - write with delta1 bytes on fd1 check_limit sees that delta1 bytes is not exceeding soft limit T2 - write with delta2 bytes on fd1 check_limit sees that delta2 bytes is not exceeding soft limit T3 - delta1 and delta2 bytes are written to the disk. Here delta1 and delta2 are checked separately and do not exceed limit, but they together exceed the limit which is not checked. We need to find a solution to solve this problem. Till then for other regressions to pass, we remove the the test which checks for soft limit crossed. Change-Id: I8f76754e975c3315557a4c570db8bb5d9e56de15 BUG: 1202292 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/9894 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tests: prevent regular hangs in ec/nfs.tNiels de Vos2015-03-161-1/+4
| | | | | | | | | | | | | | | | | When the test systems gets into a memory pressure state (the Jenkins VMs do not have much RAM), the localhost NFS-mount can get hung. It is possible to prevent this by writing with O_DIRECT. Unfortnately, the 'dd' command on NetBSD does not seem to support such an option. The alternative is to reduce the I/O that can get cached on the NFS-client, like reducing the "count" option for "dd". Change-Id: I1da9cb41133bb934bcbae0a6bc091f798514ed3d BUG: 1163543 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9883 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* Features/trash : Combined patches for trash translatorAnoop C S2015-03-1612-727/+2386
| | | | | | | | | | | | | | | | | | | | | | | | | This is the combined patch set for supporting trash feature. http://www.gluster.org/community/documentation/index.php/Features/Trash Current patch includes the following features: * volume set options for enabling trash globally and exclusively for internal operations like self-heal and re-balance * volume set options for setting the eliminate path, trash directory path and maximum trashable file size. * test script for checking the functionality of the feature * brief documentation on different aspects of trash feature. Change-Id: Ic7486982dcd6e295d1eba0f4d5ee6d33bf1b4cb3 BUG: 1132465 Signed-off-by: Anoop C S <achiraya@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/8312 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gfapi: improve source comments and error messages.Humble Devassy Chirammal2015-03-163-5/+11
| | | | | | | | | Change-Id: I0bfa44eb5b5f21e381af3e71c26ea863e4adc46f BUG:1202274 Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/9878 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* libgfapi: initialize child_down_count cond. variable.Humble Devassy Chirammal2015-03-161-0/+1
| | | | | | | | | | | | | | | currently glfs_new_from_ctx() does not initialize child_down_count conditional variable, but, glfs_free_from_ctx() destroy this variable. mnt3udp_get_export_subdir_inode() from mount3 calls glfs_free_from_ctx(), so bound to invite problems. This patch avoids the issue. Change-Id: I8c1ed83f0b39248edbb78db25c9434274b538e80 BUG: 1200879 Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/9857 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* glusterd: Protect the peer list and peerinfos with RCU.Kaushal M2015-03-1618-290/+801
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The peer list and the peerinfo objects are now protected using RCU. Design patterns described in the Paul McKenney's RCU dissertation [1] (sections 5 and 6) have been used to convert existing non-RCU protected code to RCU protected code. Currently, we are only targetting guaranteeing the existence of the peerinfo objects, ie., we are only looking to protect deletes, not all updaters. We chose this, as protecting all updates is a much more complex task. The steps used to accomplish this are, 1. Remove all long lived direct references to peerinfo objects (apart from the peerinfo list). This includes references in glusterd_peerctx_t (RPC), glusterd_friend_sm_event_t (friend state machine) and others. This way no one has a reference to deleted peerinfo object. 2. Replace the direct references with indirect references, ie., use peer uuid and peer hostname as indirect references to the peerinfo object. Any reader or updater now uses the indirect references to get to the actual peerinfo object, using glusterd_peerinfo_find. Cases where a peerinfo cannot be found are handled gracefully. 3. The readers get and use the peerinfo object only within a RCU read critical section. This prevents the object from being deleted/freed when in actual use. 4. The deletion of a peerinfo object is done in a ordered manner (glusterd_peerinfo_destroy). The object is first removed from the peerinfo list using an atomic list remove, but the list head is not reset to allow existing list readers to complete correctly. We wait for readers to complete, before resetting the list head. This removes the object from the list completely. After this no new readers can get a reference to the object, and it can be freed. This change was developed on the git branch at [2]. This commit is a combination of the following commits on the development branch. d7999b9 Protect the glusterd_conf_t->peers_list with RCU. 0da85c4 Synchronize before INITing peerinfo list head after removing from list. 32ec28a Add missing rcu_read_unlock 8fed0b8 Correctly exit read critical section once peer is found. 63db857 Free peerctx only on rpc destruction 56eff26 Cleanup style issues e5f38b0 Indirection for events and friend_sm 3c84ac4 In __glusterd_probe_cbk goto unlock only if peer already exists 141d855 Address review comments on 9695/1 aaeefed Protection during peer updates 6eda33d Revert "Synchronize before INITing peerinfo list head after removing from list." f69db96 Remove unneeded line b43d2ec Address review comments on 9695/4 7781921 Address review comments on 9695/5 eb6467b Add some missing semi-colons 328a47f Remove synchronize_rcu from glusterd_friend_sm_transition_state 186e429 Run part of glusterd_friend_remove in critical section 55c0a2e Fix gluster (peer status/ pool list) with no peers 93f8dcf Use call_rcu to free peerinfo c36178c Introduce composite struct, gd_rcu_head [1]: http://www.rdrop.com/~paulmck/RCU/RCUdissertation.2004.07.14e1.pdf [2]: https://github.com/kshlm/glusterfs/tree/urcu Change-Id: Ic1480e59c86d41d25a6a3d159aa3e11fbb3cbc7b BUG: 1191030 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/9695 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Anand Nekkunti <anekkunt@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>