glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	tier/libgfdb: Replacing ASCII query file with binary	Joseph Fernandes	2015-11-06	4	-205/+168
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Earlier, when the database was queried we used to save all the queried records in an ASCII format in the query file. This caused issues like filename having ASCII delimiter and used to take a lot of space. The tier.c file also had a lot of parsing code. Here we changed the format of the query file to binary. All the logic of serialization and formating of query record is done by libgfdb. Libgfdb provides API, gfdb_write_query_record() and gfdb_read_query_record(), which the user i.e tier migrator and CTR xlator can use to write to and read from query file. With this binary format we save on disk space i.e reduce to 50% atleast as we are saving GFID's in binary format 16 bytes and not the string format which takes 36 bytes + We are not saving path of the file + we are also saving on ASCII delimiters. The on disk format of query record is as follows, +---------------------------------------------------------------------------+ \| Length of serialized query record \| Serialized Query Record \| +---------------------------------------------------------------------------+ 4 bytes Length of serialized query record \| \| -------------------------------------------------\| \| \| V Serialized Query Record Format: +---------------------------------------------------------------------------+ \| GFID \| Link count \| <LINK INFO> \|..... \| FOOTER \| +---------------------------------------------------------------------------+ 16 B 4 B Link Length 4 B \| \| \| \| -----------------------------\| \| \| \| \| \| V \| Each <Link Info> will be serialized as \| +-----------------------------------------------+ \| \| PGID \| BASE_NAME_LENGTH \| BASE_NAME \| \| +-----------------------------------------------+ \| 16 B 4 B BASE_NAME_LENGTH \| \| \| ------------------------------------------------------------------------\| \| \| V FOOTER is a magic number 0xBAADF00D indicating the end of the record. This also serves as a serialized schema validator. Change-Id: I9db7416fd421e118dd44eafab8b535caafe50d5a BUG: 1272207 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12354 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
*	cluster/tier: fix lookup-unhashed on tiered volumes	Dan Lambright	2015-11-05	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	During attach tier the commit hash must be copied to the hot tier. Change-Id: I91b92fd8e98696993433856e1436409b657c439d BUG: 1277716 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12498 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
*	tier/ctr: Ignore bitrot related fops	Joseph Fernandes	2015-11-05	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Ignore bitrot related fops since they are internal fops. Change-Id: I5db8cf4e3fa1b186a6987eed54287bc0e964fbd4 BUG: 1278326 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12512 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
*	cluster/ec: Fix bad management of lock owners	Xavier Hernandez	2015-11-05	3	-10/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the addition of parallel reads patch for ec, a lock can have more than one owner at the same time. The list of owners was stored inside the 'owner_list' field of each fop. The problem was with fops that required more than one lock (like rename). In this case the same field was used to add the fop to more than one list, casing an overwrite of the previous list. This has been solved moving the 'owner_list' field from ec_fop_data_t to ec_lock_link_t structure. Change-Id: I6042129f09082497b80782b5704a52c35c78f44d BUG: 1276031 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/12445 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
*	mgmt/glusterd: Store arbiter-count and restore it	Pranith Kumar K	2015-11-04	4	-1/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: 1) Glusterd doesn't remember about arbiter information of replica volume in store. When glusterd goes down and comes backup, arbiter volumes will become replica volumes. 2) Glusterd doesn't import/export arbiter information to/from the other peers. 3) Volume info doesn't show any arbiter count in the output. Fix: 1) Persist arbiter information in glusterd-store 2) Import/Export arbiter information of the volume 3) Change volume info output to show arbiter count. Change-Id: I2db81e73d2694b01f7d07b08a17b41ad5a55c361 BUG: 1276675 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12475 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
*	tier/dht: Ignoring replica for migration counting	Joseph Fernandes	2015-11-04	1	-7/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We used to count replica files for migration counting even though they were ignore for migration as the replica brick didnt have the ownership (as per the replication xlator either AFR/EC). As a result the number of files migrated would show a wrong count, i.e each replicated file would be counted 1 + number of replica. This patch ignores such cases. Change-Id: I91aa352ee3b0a5029790653266e9333f3947d0ac BUG: 1276141 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12453 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
*	v info for disperse count fails while upgrading	Hari Gowtham	2015-11-04	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	The upgrade from 3.7.5-3 to 3.7.5-5 causes the type and number of bricks for the cold tier to be printed wrong. Change-Id: Ia45b97c35fef88f9c66e15e5bdb93fd30cb342af BUG: 1277481 Signed-off-by: Hari Gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/12495 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
*	glusterd: move new feature (tiering) enum op to the last of the array	Gaurav Kumar Garg	2015-11-03	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently new feature tiering have GD_OP_DETACH_TIER and GD_OP_TIER_MIGRATE enum in the middle of the glusterd_op_ enum array. In multi nodes cluster when one of the node upgraded from lower version to higher version and upon executing command can end up in a mismatch in enum ops at the receiver ends causing command execution fail. Fix is to put every new feature glusterd operation enum code to last of the enum array. Change-Id: I640f811065e8c84add624237aa80fed43fde5967 BUG: 1276643 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/12473 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Nekkunti <anekkunt@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
*	cluster/tier : Files skipped during tier query parsing	N Balachandran	2015-11-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tier query parsing code was using fscanf to read each record. As space is a delimiter for fscanf, filenames containing spaces caused the parsing to return unexpected values causing various issues in the tier process, including crashes due to buffer overflows. Change-Id: Ife602cb7ecb158fccbc2c89e4d2959bd97098a87 BUG: 1276562 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12469 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
*	glusterd : vol replace-brick fails when transport.socket.bind-address is set ↵	Mohamed Ashiq Liyazudeen	2015-11-03	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	in glusterd Change-Id: Id8c29aa46b526bc003a1d7023714b67805e35a99 BUG: 1276386 Signed-off-by: Mohamed Ashiq Liyazudeen <mliyazud@redhat.com> Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/12461 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
*	snapshot: Inherit snap-max-hard-limit from original volume	Avra Sengupta	2015-11-02	2	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A snapshot should inherit snap-max-hard-limit from the original volume while being created and when being restored to, it should restore the same. Similarly a clone taken from a snapshot should inherit snap-max-hard-limit from the snapshot. Change-Id: If8e90e2ffc10e22086b803ac8e2638a16bcec968 BUG: 1275616 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/12437 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
*	snapshot: Don't display snapshot's hard-limit and soft-limit in vol info	Avra Sengupta	2015-11-02	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The snap-max-hard-limit being displayed in the volume info currently is propagated from system's snap-max-hard-limit as that is a global option common for all volumes, and hence ends up showing the system's snap-max-hard-limit. We should not be displaying snap-max-hard-limit and snap-max-soft-limit in the volume info at all, as these are snap config options and should be set and displayed via snap config command. Modified bug-1113476.t to test the same behaviour. Change-Id: I90891f0cf7fb39fd686787297c7f7cd8c1e7daa1 BUG: 1276018 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/12443 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
*	io-stats: fix BSD build failure	Atin Mukherjee	2015-11-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Change-Id: Ieb372cb686d32a09c6df31ec849f1b3c52e0e1cd BUG: 1277024 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/12484 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	quota: add version to quota xattrs	vmallika	2015-11-02	16	-86/+341
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a quota is disable and the clean-up process terminated without completely cleaning-up the quota xattrs. Now when quota is enabled again, this can mess-up the accounting A version number is suffixed for all quota xattrs and this version number is specific to marker xaltor, i.e when quota xattrs are requested by quotad/client marker will remove the version suffix in the key before sending the response Change-Id: I1ca2c11460645edba0f6b68db70d476d8d26e1eb BUG: 1272411 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/12386 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	debug/io-stats: Add FOP sampling feature	Richard Wareing	2015-11-01	7	-34/+505
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: - Using sampling feature you can record details about every Nth FOP. The fields in each sample are: FOP type, hostname, uid, gid, FOP priority, port and time taken (latency) to fufill the request. - Implemented using a ring buffer which is not (m/c) allocated in the IO path, this should make the sampling process pretty cheap. - DNS resolution done @ dump time not @ sample time for performance w/ cache - Metrics can be used for both diagnostics, traffic/IO profiling as well as P95/P99 calculations - To control this feature there are two new volume options: diagnostics.fop-sample-interval - The sampling interval, e.g. 1 means sample every FOP, 100 means sample every 100th FOP diagnostics.fop-sample-buf-size - The size (in bytes) of the ring buffer used to store the samples. In the even more samples are collected in the stats dump interval than can be held in this buffer, the oldest samples shall be discarded. Samples are stored in the log directory under /var/log/glusterfs/samples. - Uses DNS cache written by sshreyas@fb.com (Thank-you!), the DNS cache TTL is controlled by the diagnostics.stats-dnscache-ttl-sec option and defaults to 24hrs. Test Plan: - Valgrind'd to ensure it's leak free - Run prove test(s) - Shadow testing on 100+ brick cluster Change-Id: I9ee14c2fa18486b7efb38e59f70687249d3f96d8 BUG: 1271310 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/12210 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	core: fix Ubuntu code audit (cppcheck) results	Kaleb S. KEITHLEY	2015-11-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change includes an additional fix (forward port) of a fix made on the release-3.x branches to address a comment made after the original change was merged on the master branch. * release-3.7 * Change-Id: Ie15c5919e5bf9b0a1c66e20dc42d80fdfa8bd7f4 * BZ: 1227808 * http://review.gluster.org/11069 Change-Id: I4fc2672ab1a17998b2e40bc43eb6a3e15058a086 BUG: 1109180 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/11067 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	glusterd: fix info file checksum mismatch during upgrade	anand	2015-10-31	1	-10/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	issue: probing a new node(>=3.6) from 3.5 cluster is moving the peer to rejected state. fix: Disperse vol support is added from 3.6 release, so write disperse fields (disperse_count=0 and redundancy_count=0) in vol info file only if cluster version supported. Change-Id: I11d5e2e337b9bbaddc8e52ca7295ba481beb1132 BUG: 1276423 Signed-off-by: anand <anekkunt@redhat.com> Reviewed-on: http://review.gluster.org/12464 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
*	afr/glusterd: Fix naming issue in tier related changes	Mohammed Rafi KC	2015-10-30	5	-15/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	changing some of the function names added recently as part of the tiering changes. Change-Id: I238831128ee00cdf83f8a80be937d3528d133099 BUG: 1275489 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12431 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
*	snapshot : copying nfs-ganesha export file	Jiffin Tony Thottan	2015-10-30	4	-0/+213
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While taking snapshot, the export file used by the volume should copy to snap directory. So that when restore of snapshot happens, the volume can retain all its configuration for exporting via nfs-ganesha. The export file is stored at "/etc/ganesha/export" in the following format "export.<volname>.conf" The fix handles given cases in the following manner : case a: The nfs-ganesha(global) is ON during snapshot and restore. i.) Volume was exported during snapshot. When we restore snapshot, then volume should be exported back with old configuration file. ii.) Volume was unexported during snapshot. When we restore snapshot, then volume should unexported again. case b: The nfs-ganesha is ON during snapshot and OFF during restore Volume was exported during snapshot. When we restore snapshot, the conf will be copied to corresponding location and if nfs-ganesha enabled again, then volume will be exported. For the clones, export conf file will created in /etc/ganesha/export and then export it via ganesha. Change-Id: Ideecda15bd4db58e991cf6c8de7bb93f3db6cd20 BUG: 1257709 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/12034 Reviewed-by: Avra Sengupta <asengupt@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
*	cluster/tier enable CTR on attach tier	Dan Lambright	2015-10-30	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	CTR is currently disabled by default, and must be manually enabled for tiering to start. This is an overhead on the administrator and easy to overlook. Enable it automatically when a tier is attached. Change-Id: I0c29de8762faec1bfe6d1376a57eeef3357ad15a BUG: 1274847 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12420 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
*	nfs : avoid invalid usage of `cs` variable in nfs fops	Jiffin Tony Thottan	2015-10-30	1	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to changes from http://review.gluster.org/#/c/12162/ a path variable is added to nfs3_log_common_res() and usually `cs->resolvedloc.path` is passed for that. But in certain fop function `cs` may not filled due error and when it is logged using nfs3_log_common_res() results in a crash. This patch will fix the same. Change-Id: I5a709818923e7884bd04e329834ee352a1b3a58f BUG: 1276243 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/12458 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com>
*	fuse: Avoid redundant lookup on "." and ".."	Susant Palai	2015-10-29	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	credit: R. Gowdappa Change-Id: I3bc1534e499f2eccd114db69a29c0b2ce82775db BUG: 1273315 Reviewed-on: http://review.gluster.org/12374 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/dht/rebalance: rebalance failure handling	Susant Palai	2015-10-29	2	-25/+161
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At current state rebalance aborts basically on any failure like fix-layout of a directory, readdirp, opendir etc. Unless it is not a remove-brick process we can ignore these failures. Major impact: Any failure in the gf_defrag_process_dir means there are files left unmigrated in the directory. Fix-layout(setxattr) failure will impact it's child subtree i.e. the child subtree will not be rebalanced. Settle-hash (commit-hash)failure will trigger lookup_everywhere for immediate children until the next commit-hash. Note: Remove-brick opertaion is still sensitive to any kind of failure. Change-Id: I08ab71909bc832f03cc1517172525376f7aed14a BUG: 1257076 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12013 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/afr: disable self-heal lock compatibility for arbiter volumes	Pranith Kumar K	2015-10-29	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: afrv2 takes locks from infinity-2 to infinity-1 to be compatible with <=3.5.x clients. For arbiter volumes this leads to problems as the I/O takes full file locks. Solution: Don't be compatible with <=3.5.x clients on arbiter volumes as arbiter volumes are introduced in 3.7 Change-Id: I48d6aab2000cab29c0c4acbf0ad356a3fa9e7bab BUG: 1275247 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12426 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
*	core: use syscall wrappers instead of direct syscalls	Kaleb S. KEITHLEY	2015-10-28	9	-52/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	various xlators and other components are invoking system calls directly instead of using the libglusterfs/syscall.[ch] wrappers. If not using the system call wrappers there should be a comment in the source explaining why the wrapper isn't used. Change-Id: I8ef94c48728666465abf126c778b70c9e5c00e47 BUG: 1267967 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/12273 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
*	core: use syscall wrappers instead of direct syscalls - miscellaneous	Kaleb S. KEITHLEY	2015-10-28	15	-170/+181
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	various xlators and other components are invoking system calls directly instead of using the libglusterfs/syscall.[ch] wrappers. If not using the system call wrappers there should be a comment in the source explaining why the wrapper isn't used. Change-Id: I1f47820534c890a00b452fa61f7438eb2b3f667c BUG: 1267967 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/12276 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
*	core: use syscall wrappers instead of direct syscalls -- glusterd	Kaleb S. KEITHLEY	2015-10-28	20	-183/+195
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	various xlators and other components are invoking system calls directly instead of using the libglusterfs/syscall.[ch] wrappers. If not using the system call wrappers there should be a comment in the source explaining why the wrapper isn't used. Change-Id: I28bf2a5f7730b35914e7ab57fed91e1966b30073 BUG: 1267967 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/12379 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
*	afr: write zeros to sink for non-sparse files	Ravishankar N	2015-10-28	3	-24/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If a file is created with zeroes ('dd', 'fallocate' etc.) when a brick is down, the self-heal does not write the zeroes to the sink after it comes up. Consequenty, there is a mismatch in disk-usage amongst the bricks of the replica. Fix: If we definitely know that the file is not sparse, then write the zeroes to the sink even if the checksums match. Change-Id: Ic739b3da5dbf47d99801c0e1743bb13aeb3af864 BUG: 1272460 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12371 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	features/shard: Force cache-refresh when lookup/readdirp/stat detect that ↵	Krutika Dhananjay	2015-10-28	2	-19/+113
\| \| \| \| \| \| \| \| \| \| \|	xattr value has changed Change-Id: Ia3225a523287f6689b966ba4f893fc1b1fa54817 BUG: 1272986 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/12400 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/tier do not log error message on lookup heal for files on hot tier	Dan Lambright	2015-10-28	2	-17/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On fix-layout heal files are scanned. Files found are exist on the hot or cold subvolume. Those not found in the cold tier would exist on the hot. They should not be flagged as an error. Replace INFO with TRACE for common tier migration logs. Frequent migration was growing the log files too quickly. On migratation failures, do not acrue files towards cycle limit's budget. Change-Id: Ie832ee07c43bce5477ae81c939d1fe8416a11615 BUG: 1275383 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12430 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Joseph Fernandes
*	cluster/ec: update version and size on good bricks	Ashish Pandey	2015-10-28	1	-10/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: readdir/readdirp fops calls [f]xattrop with fop->good which contain only one brick for these operations. That causes xattrop to be failed as it requires at least "minimum" number of brick. Solution: Use lock->good_mask to call xattrop. lock->good_mask contain all the good locked bricks on which the previous write opearion was successfull. Change-Id: If1b500391aa6fca6bd863702e030957b694ab499 BUG: 1274629 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/12419 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	glusterd: call glusterd_store_volinfo in bump up op-version	Atin Mukherjee	2015-10-27	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After an upgrade, op-version is expected to be updated through gluster volume set. If the new version introduces any feature which changes volinfo structure without storing the default values of these new options would result into cksum issues. Change-Id: I57b4667f3403839811735bf66bef29e5200a9241 BUG: 1262805 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/12171 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
*	features/changelog: record mknod if tier-dht linkto is set	Saravanakumar Arumugam	2015-10-27	1	-2/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a series of patches which aims to fix geo-replication in a Tiering Volume. Problem: Consider, a file is placed in volume initially and then hot tier is attached. During any operation on the file, due to lookup a linkto file is created in hot tier. Now, any namespace operation carried out on the file is recorded in both cold and hot tier. There is a room for races when both changelogs are replayed. Solution: So, We are going to replay (namespace related)operations only in the hot tier. Why? a. If the file is directly placed in Hot tier, all fops will be recorded in HOT tier. b. If the file is already present in Cold tier, and if any fop is carried out, it creates linkto file in Hot tier. Now, operations like UNLINK, RENAME are captured in Hot tier(by means of linkto file). This way, we can get both tier's operation in HOT tier itself. But, We may miss initial Data sync immediately after creating the file as it is only recording MKNOD. So, if MKNOD encountered with sticky bit set, queue DATA operation for the corresponding gfid. ( This geo-rep related changes are addressed in this patch: http://review.gluster.org/12326/ ) So, If tier-dht linkto is set, we need to record the corresponding MKNOD. Earlier this was avoided as it was set as INTERNAL fop. (This is addressed here in this patch) Change-Id: I25514fe3e25f68592a8d6361507f8c8a4fcb70b1 BUG: 1266875 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/12417 Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
*	features/changelog: ignore recording tiering rebalance fops	Saravanakumar Arumugam	2015-10-27	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Recording of tiering rebalance process's fops like Creation and Deletion of file must be avoided. Ignore the fops using corresponding pid. Change-Id: Ifdc7765598d04d033f93e6339e9b188f7566cb65 BUG: 1266875 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/12239 Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
*	tier: Typo while setting the wrong value of low/hi watermark	hari gowtham	2015-10-27	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	While setting the wrong value of watermark-hi/low the output shows "compatiblevalue" whereas it should be "compatible value" Change-Id: I29c8f9a954928d22e436465f4ebc30bd08640138 BUG: 1275502 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/12432 Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
*	features/shard: Support geo-rep for sharded volume	Kotresh HR	2015-10-26	2	-34/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Approach: Shard xlator on slave side is by passed for all the fops to geo-rep mount. So each shard on master is considered as a separate file for geo-rep and it syncs them separately on to slave. The extended attribute in which shard maintains the size is also synced from master and shard on slave doesn't calculate by itself. Pre-requisites: 1. If master is sharded volume, slave also should be sharded. 2. Slave's shard configurations should be same as master. 3. Geo-rep config of xattr sync should not be disabled. All other dependant patches: 1. http://review.gluster.org/#/c/12205/ 2. http://review.gluster.org/#/c/12206/ 3. http://review.gluster.org/#/c/12225/ 4. http://review.gluster.org/#/c/12226/ Change-Id: I474220d69fa030b1e06a4fa0868c34fabe02efcf BUG: 1265148 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/12228 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	mount/fuse: use a queue instead of pipe to communicate with thread	Raghavendra G	2015-10-26	3	-63/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	doing inode/entry invalidations. Writing to pipe can block if pipe is full. This can lead to deadlocks in some situations. Consider following situation: 1. Kernel sends a write on an inode. Client is waiting for a response to write from brick. 2. A lookup happens on behalf of different application/thread on the same inode. In response, mdc tries to invalidate the inode. 3. fuse_invalidate_inode is called. It writes a invalidation request to pipe. Another thread which reads from this pipe writes the request to /dev/fuse. The invalidate code in fuse-kernel-module, tries to acquire lock on all pages for the inode and is blocked as a write is in progress on same inode (step 1) 4. Now, poller thread is blocked in invalidate notification and cannot receive any messages from same socket (on which lookup response came). But client is expecting a response for write from same socket (again step1) and we've a deadlock. The deadlock can be solved in two ways: 1. Use a queue (and a conditional variable for notifications) to pass invalidation requests from poller to invalidate thread. This is a variant of using non-blocking pipe, but doesn't have any limit on the amount of data (worst case we run out of memory and error out). 2. Allow events from sockets, immediately after we read one rpc-msg. Currently we disallow events till that rpc-msg is read from socket, processed and handled by higher layers. That way we won't run into these kind of issues. Also, it'll increase parallelism in way of reading from sockets. This patch implements solution 1 above. Change-Id: I8e8199fd7f4da9eab46a719d9292f35c039967e1 BUG: 1273387 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/12402
*	afr: wind writes only on subvols where preop succeeded	Ravishankar N	2015-10-26	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Call local->transaction.wind() only on subvols where pre-op succeeded. 2. Update op_errno in afr_changelog_cbk call path. This fixes a bug in commit 7945121dda340ec8f25711b2ad3ca70b544de967 where we return EUCLEAN to the application if pre-op fails on all bricks. Change-Id: Iab8776e49a992e7a255314bba542742f7607f3ec BUG: 1272362 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12415 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	tier/ctr: Correcting the internal fop calculation	Joseph Fernandes	2015-10-25	2	-17/+28
\| \| \| \| \| \| \| \| \| \| \|	Correcting the internal fop calculation method, as it had wrong logic. Change-Id: I1d0b40a1e27548147203ddd503794059652ac049 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12418 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
*	libglusterfs: replace default functions with generated versions	Jeff Darcy	2015-10-22	2	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replacing repetitive code like this with code generated from a more compact "canonical" definition carries several advantages. * Ease the process of adding new fops (e.g. GF_FOP_IPC). * Ease the process of making global changes to existing fops (e.g. adding "xdata"). * Ensure strict consistency between all of the pieces that must be compatible with each other, through both kinds of changes. What we have right now is just a start. The above benefits will only truly be realized when we use the same definitions to generate stubs, syncops, and perhaps even parts of gfapi or glupy. This same infrastructure can also be used to reduce code duplication and potential for error in many of our translators. NSR already uses a similar technique, using a few hundred lines of templates to generate a few thousand lines of code. The ability to make a global "aspect" change (e.g. to quorum checking) in one place instead of seventy has already been demonstrated there. Other candidates for code generation include the AFR/EC transaction infrastructure, or stub creation/resumption in io-threads. Change-Id: If7d59de7a088848b557f5aea00741b4fe19017c1 BUG: 1271325 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/9411 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/tier: add pause tier for snapshots	Dan Lambright	2015-10-21	9	-11/+428
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Snaps of tiered volumes cannot handle files undergoing migration. We implement a helper mechanism to "pause" migration. Any files undergoing migration are aborted. Clean up is done to remove sticky bits and data at the destination. Migration is restarted after snap completes. For testing an internal switch is added. It is not exposed externally. gluster volume set vol1 tier-pause [true\|false] Change-Id: Ia85bbf89ac142e9b7e73fcbef98bb9da86097799 BUG: 1267950 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12304 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/dht : op_ret not set correctly in dht_fsync_cbk	N Balachandran	2015-10-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	local->op_ret was not set correctly in dht_fsync_cbk in case of files being migrated Change-Id: If73ae04368ea0c7f6868c8704dfc2deb2faee753 BUG: 1273372 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12401 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
*	cluster/tier do not abort migration if a single brick is down	Dan Lambright	2015-10-20	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a bricks are down, promotion/demotion should still be possible. For example, if an EC brick is down, the other bricks are able to recover the data and migrate it. Change-Id: I8e650c640bce22a3ad23d75c363fbb9fd027d705 BUG: 1273215 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12397 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Joseph Fernandes
*	snapshot: Fix snapshot clone postvalidate	Avra Sengupta	2015-10-20	4	-52/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In glusterd_snapshot_clone_postvalidate(), we were deleting snap object and snap vol, by looking up snapname. Hence, it was deleting the orignal snapshot from which the clone was being created Instead it should fetch the clonename, the respective clone vol, and its corresponding snap object, and delete them. Also glusterd_snap_remove(), needs to differentiate a clone snap object from a snaphsot snap object, as in case of a clone snap object, we don't have any persisted data in /var/run/gluster/snaps/ and hence is shouldn't try to delete anything there. Change-Id: I02bb22a3898d5720e318a02d6cc32d25f75d317d BUG: 1272339 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/12364 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
*	afr: do not wind write if pre-op fails on all children	Ravishankar N	2015-10-20	3	-13/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. When winding the pre-op, transaction.pre_op[i] is set. If the pre-op fails, transaction.failed_subvols[i] is set. If if fails on all chidren, we can directly proceed to unlock (via afr_changelog_post_op_now) without trying to wind the write, fail and then go to unlock. 2. 'fop_subvols' seems to be an unused variable, hence removing it. Change-Id: I9525628daf48082e979b0093fa0478934495e61f BUG: 1272362 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12368 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com>
*	features/snap : cleanup the root loc in statfs	Ashish Pandey	2015-10-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem : In svc_statfs function, wipe_loc is getting called on loc passed by nfs. This loc is being used by svc_stat which throws erro if loc->inode is NULL. Solution : wipe_loc should be called on local root_loc. Change-Id: I9cc5ee3b1bd9f352f2362a6d997b7b09051c0f68 BUG: 1260848 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/12123 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/tier remove suprious log messages on valid failed migration	Dan Lambright	2015-10-19	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	On a write to a replica volume, we record in all brick's databases an entry. When the tier daemon runs, it will only move the file if it is the true owner of the file as defined by the XATTR_NODE_UUID_KEY. Change-Id: Ib82717f87a3f94f3d0d9f969773de9e88d6aaf22 BUG: 1273043 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12391 Reviewed-by: Joseph Fernandes Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/tier: Changed tier xattr-name value	N Balachandran	2015-10-19	2	-13/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each tier layer (for future stacking implementations) must have a unique xattr name. We are currently using the name of the tier subvolume excluding the volume name. Change-Id: Id4adea61dc1c8473fb1d4d7364d1940278c6e129 BUG: 1259298 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12350 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
*	cluster/ec : Remove index entries if file/dir does not exist	Ashish Pandey	2015-10-15	1	-33/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: During write and rebalance if a brick is down, index entries will be created. If the same file gets migrated to other subvol by rebalance process, these index entries will remain in index directory. During heal, these indices should be removed when we get ENOENT or ESTALE for a index. Solution: Capture correct errno and take appropriate action to purge these indices. Change-Id: I1aad8b99e4df2e139648e3bf971e4cb1c4b38699 Bug: 1271358 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/12353 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	protocol/server: define the max number of inodes in lru list as a number	Raghavendra Bhat	2015-10-15	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* The max number of inodes in the lru list of the inode table was being defined in terms of memory units (GF_UNIT_MB) instead of number. And the description of the option was also referring to it in memory units instead of number. Change-Id: I48f07e7d2826406697eb2a13714ab22feae81d89 BUG: 1266883 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/12242 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>