done/GlusterFS 3.7/Scheduling of Snapshot.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229

Feature
-------

Scheduling Of Snapshot

Summary
-------

GlusterFS volume snapshot provides point-in-time copy of a GlusterFS
volume. Currently, GlusterFS volume snapshots can be easily scheduled by
setting up cron jobs on one of the nodes in the GlusterFS trusted
storage pool. This has a single point failure (SPOF), as scheduled jobs
can be missed if the node running the cron jobs dies.

We can avoid the SPOF by distributing the cron jobs to all nodes of the
trusted storage pool.

Owner(s)
--------

Avra Sengupta <asengupt@redhat.com>

Copyright
---------

Copyright (c) 2015 Red Hat, Inc. <http://www.redhat.com>

This feature is licensed under your choice of the GNU Lesser General
Public License, version 3 or any later version (LGPLv3 or later), or the
GNU General Public License, version 2 (GPLv2), in all cases as published
by the Free Software Foundation.

Detailed Description
--------------------

The solution to the above problems involves the usage of:

-   A shared storage - A gluster volume by the name of
    "gluster\_shared\_storage" is used as a shared storage across nodes
    to co-ordinate the scheduling operations. This shared storage is
    mounted at /var/run/gluster/shared\_storage on all the nodes.

-   An agent - This agent will perform the actual snapshot commands,
    instead of cron. It will contain the logic to perform coordinated
    snapshots.

-   A helper script - This script will allow the user to initialise the
    scheduler on the local node, enable/disable scheduling,
    add/edit/list/delete snapshot schedules.

-   cronie - It is the default cron daemon shipped with RHEL. It invokes
    the agent at the appropriate intervals as mentioned by the user to
    perform the snapshot operation on the volume as mentioned by the
    user in the schedule.

Initial Setup
-------------

The administrator needs to create a shared storage that can be available
to nodes across the cluster. A GlusterFS volume by the name of
"gluster\_shared\_storage" should be created for this purpose. It is
preferable that the \*shared volume\* be a replicate volume to avoid
SPOF.

Once the shared storage is created, it should be mounted on all nodes in
the trusted storage pool which will be participating in the scheduling.
The location where the shared\_storage should be mounted
(/var/run/gluster/shared\_storage) in these nodes is fixed and is not
configurable. Each node participating in the scheduling then needs to
perform an initialisation of the snapshot scheduler by invoking the
following:

snap\_scheduler.py init

NOTE: This command needs to be run on all the nodes participating in the
scheduling

Helper Script
-------------

The helper script(snap\_scheduler.py) will initialise the scheduler on
the local node, enable/disable scheduling, add/edit/list/delete snapshot
schedules.

a) snap\_scheduler.py init

This command initialises the snap\_scheduler and interfaces it with the
crond running on the local node. This is the first step, before
executing any scheduling related commands from a node.

NOTE: The helper script needs to be run with this option on all the
nodes participating in the scheduling. Other options of the helper
script can be run independently from any node, where initialisation has
been successfully completed.

b) snap\_scheduler.py enable

The snap scheduler is disabled by default after initialisation. This
command enables the snap scheduler.

c) snap\_scheduler.py disable

This command disables the snap scheduler.

d) snap\_scheduler.py status

This command displays the current status(Enabled/Disabled) of the snap
scheduler.

e) snap\_scheduler.py add "Job Name" "Schedule" "Volume Name"

This command adds a new snapshot schedule. All the arguments must be
provided within double-quotes(""). It takes three arguments:

-\> Job Name: This name uniquely identifies this particular schedule,
and can be used to reference this schedule for future events like
edit/delete. If a schedule already exists for the specified Job Name,
the add command will fail.

-\> Schedule: The schedules are accepted in the format crond
understands:-

1.  Example of job definition:
2.  .---------------- minute (0 - 59)
3.  | .------------- hour (0 - 23)
4.  | | .---------- day of month (1 - 31)
5.  | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
6.  | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR
    sun,mon,tue,wed,thu,fri,sat
7.  | | | | |
8.  -   -   -   -   -   user-name command to be executed

Although we accept all valid cron schedules, currently we support
granularity of snapshot schedules to a maximum of half-hourly snapshots.

-\> Volume Name: The name of the volume on which the scheduled snapshot
operation will be performed.

f) snap\_scheduler.py edit "Job Name" "Schedule" "Volume Name"

This command edits an existing snapshot schedule. It takes the same
three arguments that the add option takes. All the arguments must be
provided within double-quotes(""). If a schedule does not exists for the
specified Job Name, the edit command will fail.

g) snap\_scheduler.py delete "Job Name"

This command deletes an existing snapshot schedule. It takes the job
name of the schedule as argument. The argument must be provided within
double-quotes(""). If a schedule does not exists for the specified Job
Name, the delete command will fail.

h) snap\_scheduler.py list

This command lists the existing snapshot schedules in the following
manner: Pseudocode:

``\
`# snap_scheduler.py list`\
`JOB_NAME         SCHEDULE         OPERATION        VOLUME NAME      `\
`--------------------------------------------------------------------`\
`Job0             * * * * *        Snapshot Create  test_vol    `

The agent
---------

The snapshots scheduled with the help of the helper script, are read by
crond which then invokes the agent(gcron.py) at the scheduled intervals
to perform the snapshot operations on the specified volumes. It then
performs the scheduled snapshots using the following algorithm to
coordinate.

Pseudocode:

``\
`start_time = get current time`\
`lock_file = job_name passed as an argument`\
`vol_name = volume name psased as an argument`\
`try POSIX locking the $lock_file`\
`    if lock is obtained, then`\
`        mod_time = Get modification time of $entry`\
`        if $mod_time < $start_time, then`\
`            Take snapshot of $entry.name (Volume name)`\
`            if snapshot failed, then`\
`                log the failure`\
`            Update modification time of $entry to current time`\
`        unlock the $entry`

The coordination with other scripts running on other nodes, is handled
by the use of POSIX locks. All the instances of the script will attempt
to lock the lock\_file which is essentialy an empty file with the job
name, and one which gets the lock will take the snapshot.

To prevent redoing a done task, the script will make use of the mtime
attribute of the entry. At the beginning execution, the script would
have saved its start time. Once the script obtains the lock on the
lock\_file, before taking the snapshot, it compares the mtime of the
entry with the start time. The snapshot will only be taken if the mtime
is smaller than start time. Once the snapshot command completes, the
script will update the mtime of the lock\_file to the current time
before unlocking.

If a snapshot command fails, the script will log the failure (in syslog)
and continue with its operation. It will not attempt to retry the failed
snapshot in the current schedule, but will attempt it again in the next
schedules. It is left to the administrator to monitor the logs and
decide what to do after a failure.

Assumptions and Limitations
---------------------------

It is assumed that all nodes in the have their times synced using NTP or
any other mechanism. This is a hard requirement for this feature to
work.

The administrator needs to have python2.7 or higher installed, as well
as the argparse module installed, to be able to use the helper
script(snap\_scheduler.py).

There is a latency of one minute, between providing a command by the
helper script and that command taking effect. Hence, currently we do not
support snapshot schedules with per minute granularity.

The administrator can however leverage the scheduler to schedule
snapshots with granularity of
half-hourly/hourly/daily/weekly/monthly/yearly periodic intervals. They
can also schedule snapshots, which are customised mentioning which
minute of the hour, which day of the week, which week of the month, and
which month of the year, they want to schedule the snapshot operation.