README.rst 8.96 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13

Base resilient stack
====================

This stack is meant to be extended by SR profiles, or other stacks, that need to provide
automated backup/restore, election of backup candidates, and instance failover.

As reference implementations, both stack/lamp and stack/lapp define resilient behavior for
MySQL and Postgres respectively.

This involves three different software_types:

 * pull-backup
Marco Mariani's avatar
Marco Mariani committed
14 15
 * {mysoftware}_export
 * {mysoftware}_import
16

Marco Mariani's avatar
Marco Mariani committed
17
where 'mysoftware' is the component that needs resiliency (can be postgres, mysql, erp5, and so on).
18 19 20 21 22 23 24


pull-backup
-----------

This software type is defined in

25
    https://lab.nexedi.com/nexedi/slapos/blob/HEAD/stack/resilient/instance-pull-backup.cfg.in
26 27 28 29 30 31 32 33 34 35 36

and there should be no reason to modify or extend it.

An instance of type 'pull-backup' will receive data from an 'export' instance and immediately populate an 'import' instance.
The backup data is automatically used to build an historical, incremental archive in srv/backup/pbs.


export
------

example:
37
    https://lab.nexedi.com/nexedi/slapos/blob/HEAD/stack/lapp/postgres/instance-postgres-export.cfg.in
38 39 40 41

This is the *active* instance - the one providing live data to the application.

A backup is run via the bin/exporter script: it will
Marco Mariani's avatar
Marco Mariani committed
42
     1) run bin/{mysoftware}-backup
43 44 45 46
 and 2) notify the pull-backup instance that data is ready.

The pull-backup, upon receiving the notification, will make a copy of the data and transmit it to the 'import' instances.

Marco Mariani's avatar
Marco Mariani committed
47
You should provide the bin/{mysoftware}-exporter script, see for instance
48 49
  https://lab.nexedi.com/nexedi/slapos/blob/HEAD/slapos/recipe/postgres/__init__.py#L207
  https://lab.nexedi.com/nexedi/slapos/blob/HEAD/slapos/recipe/mydumper.py#L71
50 51

By default, as defined in
52
  https://lab.nexedi.com/nexedi/slapos/blob/HEAD/stack/resilient/pbsready-export.cfg.in#L27
53 54 55 56 57 58 59 60
the bin/exporter script is run every 60 minutes.



import
------

example:
61
    https://lab.nexedi.com/nexedi/slapos/blob/HEAD/stack/lapp/postgres/instance-postgres-import.cfg.in
62 63 64 65 66 67

This is the *fallback* instance - the one that can be activated and thus become active.
Any number of import instances can be used. Deciding which one should take over can be done manually
or through a monitoring + election script.


Marco Mariani's avatar
Marco Mariani committed
68
You should provide the bin/{mysoftware}-importer script, see for instance
69

70 71
  https://lab.nexedi.com/nexedi/slapos/blob/HEAD/slapos/recipe/postgres/__init__.py#L233
  https://lab.nexedi.com/nexedi/slapos/blob/HEAD/slapos/recipe/mydumper.py#L71
72 73 74



Marco Mariani's avatar
Marco Mariani committed
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127

In practice
-----------

Add resilience to your software

Let's say you already have a file instance-mysoftware.cfg.in that instantiates your
software. In which there is a part [mysoftware] where there is the main recipe
that instantiates the program.

You need to create two new files, instance-mysoftware-import.cfg.in and
instance-mysoftware-export.cfg.in, following this layout:


IMPORT:

[buildout]
extends = ${instance-mysoftware:output}
          ${pbsready-import:output}

parts +=
    mysoftware
    import-on-notification

[importer]
recipe = YourImportRecipe
wrapper = $${rootdirectory:bin}/$${slap-parameter:namebase}-importer
backup-directory = $${directory:backup}
...



EXPORT:

[buildout]
extends = ${instance-mysoftware:output}
          ${pbsready-export:output}

parts +=
    mysoftware
    cron-entry-backup

[exporter]
recipe = YourExportRecipe
wrapper = $${rootdirectory:bin}/$${slap-parameter:namebase}-exporter
backup-directory = $${directory:backup}
...


In the [exporter] / [importer] part, you are free to do whatever you want, but
you need to dump / import your data from $${directory:backup} and specify a
wrapper. I suggest you only add options and specify your export/import recipe.

128

Marco Mariani's avatar
Marco Mariani committed
129 130 131 132 133 134 135


Checking that it works
----------------------

To check that your software instance is resilient you can proceed this way:
Once all instances are successfully deployed, go to your export instance, connect as the instance user and run:
136
$ ~/bin/exporter
Marco Mariani's avatar
Marco Mariani committed
137
It is the script responsible for triggering the resiliency stack on your instance. After doing a backup of your data, it will notify the pull-backup instances of a new backup, triggering the transfer of this data to the import instances.
138

Marco Mariani's avatar
Marco Mariani committed
139
Once this script is run successfully, go to your import instance, connect as its instance user and check ~/srv/backup/"your sofwtare"/, the location of the data you wanted to receive. The last part of the resiliency is up to your import script.
140 141

DEBUGGING:
Marco Mariani's avatar
Marco Mariani committed
142 143 144
Here is a partial list of things you can check to understand what is causing the problem:

- Check that your import script does not fail and successfully places your data in ~/srv/backup/"your software" (as the import instance user) by runnig:
145 146 147 148
$ ~/bin/"your software"-exporter
- Check the export instance script is run successfully as this instance user by running:
$ ~/bin/exporter
- Check the pull-instance system did its job by going to one of your pull-backup instance, connect as its user and check the log : ~/var/log/equeue.log
Marco Mariani's avatar
Marco Mariani committed
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216


-----------------------------------------------------------------------------------------

Finally, instance-mysoftware-import.cfg.in and
instance-mysoftware-export.cfg.in need to be downloaded and accessible by
switch_softwaretype, and you need to extend stack/resilient/buildout.cfg and
stack/resilient/switchsoftware.cfg to download the whole resiliency bundle.

Here is how it's done in the mariadb case for the lamp stack:



 ** buildout.cfg **

extends =
   ../resilient/buildout.cfg

[instance-mariadb-import]
recipe = slapos.recipe.template
url = ${:_profile_base_location_}/mariadb/instance-mariadb-import.cfg.in
output = ${buildout:directory}/instance-mariadb-import.cfg
md5sum = ...
mode = 0644

[instance-mariadb-export]
recipe = slapos.recipe.template
url = ${:_profile_base_location_}/mariadb/instance-mariadb-export.cfg.in
output = ${buildout:directory}/instance-mariadb-export.cfg
md5sum = ...
mode = 0644



 ** instance.cfg.in **

extends =
  ../resilient/switchsoftware.cfg

[switch-softwaretype]
...
mariadb = ${instance-mariadb:output}
mariadb-import = ${instance-mariadb-import:output}
mariadb-export = ${instance-mariadb-export:output}
...



Then, in the .cfg file where you want to instantiate your software, you can do, instead of requesting your software

 * template-resilient.cfg.in *

[buildout]
...
parts +=
  {{ parts.replicate("Name","3") }}
  ...

[...]
...
[ArgLeader]
...

[ArgBackup]
...

{{ replicated.replicate("Name", "3",
                        "mysoftware-export", "mysoftware-import",
217
                        "ArgLeader","ArgBackup", slapparameter_dict=slapparameter_dict) }}
Marco Mariani's avatar
Marco Mariani committed
218 219 220

and it'll expend into the sections require to request Name0, Name1 and Name2,
backuped and resilient. The leader will expend the section [ArgLeader], backups
221 222
will expend [ArgBackup]. slapparameter_dict is the dict containing the parameters given to the instance. If you don't need to specify any options, you can
omit the last three arguments in replicate().
Marco Mariani's avatar
Marco Mariani committed
223 224 225 226 227 228 229

Since you will compile your template with jinja2, there should be no $${},
because it is not yet possible to use jinja2 -> buildout template.

To compile with jinja2, see jinja2's recipe.


230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262
Deploying your resilient software
---------------------------------
You can provide sla parameters to each request you make (a lot: for export, import and pbs).

example:
Here is a small example of parameters you can provide to control the deployment (case of a runner):
<?xml version='1.0' encoding='utf-8'?>
<instance>
  <parameter id="-sla-1-computer_guid">COMP-GRP1</parameter>
  <parameter id="-sla-pbs1-computer_guid">COMP-PBS1</parameter>
  <parameter id="-sla-2-computer_guid">COMP-GROUP2</parameter>
  <parameter id="-sla-runner2-computer_guid">COMP-RUN2</parameter>
  <parameter id="-sla-2-network_guid">NET-2</parameter>
  <parameter id="-sla-runner0-computer_guid">COMP-RUN0</parameter>
</instance>
Consequence on sla parameters by request:
  * runner0: computer_guid = COMP-RUN0 (provided directly)
  * runner1: computer_guid = COMP-GRP1 (provided by group 1)
  * runner2: computer_guid = COMP-RUN2 (provided by group 2 but overided directly)
             network_guid  = NET-2 (provided by group 2)
  * PBS 1:   computer_guid = COMP-PBS1 (provided by group 1 but overided directly)
  * PBS 2:   computer_guid = COMP-GRP2 (provided by group 2)
             network_guid  = NET-2 (provided by group 2)


Parameters are analysed this way:
 * If it starts with "-sla-" it is not transmitted to requested instance and is used to do the request as sla.
 * -sla-foo-bar=example (foo being a magic key) will be use for each request considering "foo" as a key to use and the sla parameter is "bar". So for each group using the "foo" key, sla parameter "bar" is used with value "example"
About magic keys:
We can find 2 kinds of magic keys:
 * id : example, in "-sla-2-foo" 2 is the magic key and the parameter will be used for each request with id 2 (in case of kvm: kvm2 and PBS 2)
 * nameid : example, in "-sla-kvm2-foo", foo will be used for kvm2 request. Name for pbs is "pbs" -> "-sla-pbs2-foo".
IMPORTANT NOTE: in case the same foo parameter is asked for the group, the nameid key prevail