README.rst 8.56 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13

Base resilient stack
====================

This stack is meant to be extended by SR profiles, or other stacks, that need to provide
automated backup/restore, election of backup candidates, and instance failover.

As reference implementations, both stack/lamp and stack/lapp define resilient behavior for
MySQL and Postgres respectively.

This involves three different software_types:

 * pull-backup
Marco Mariani's avatar
Marco Mariani committed
14 15
 * {mysoftware}_export
 * {mysoftware}_import
16

Marco Mariani's avatar
Marco Mariani committed
17
where 'mysoftware' is the component that needs resiliency (can be postgres, mysql, erp5, and so on).
18 19 20 21 22 23 24


pull-backup
-----------

This software type is defined in

25
    https://lab.nexedi.com/nexedi/slapos/blob/HEAD/stack/resilient/instance-pull-backup.cfg.in
26 27 28 29 30 31 32 33 34 35 36 37 38 39

and there should be no reason to modify or extend it.

An instance of type 'pull-backup' will receive data from an 'export' instance and immediately populate an 'import' instance.
The backup data is automatically used to build an historical, incremental archive in srv/backup/pbs.


export
------


This is the *active* instance - the one providing live data to the application.

A backup is run via the bin/exporter script: it will
Marco Mariani's avatar
Marco Mariani committed
40
     1) run bin/{mysoftware}-backup
41 42 43 44
 and 2) notify the pull-backup instance that data is ready.

The pull-backup, upon receiving the notification, will make a copy of the data and transmit it to the 'import' instances.

Marco Mariani's avatar
Marco Mariani committed
45
You should provide the bin/{mysoftware}-exporter script, see for instance
46
  https://lab.nexedi.com/nexedi/slapos/blob/1.0.142/slapos/recipe/mydumper.py#L71
47 48

By default, as defined in
49
  https://lab.nexedi.com/nexedi/slapos/blob/HEAD/stack/resilient/pbsready-export.cfg.in#L27
50 51 52 53 54 55 56 57 58 59 60 61
the bin/exporter script is run every 60 minutes.



import
------

This is the *fallback* instance - the one that can be activated and thus become active.
Any number of import instances can be used. Deciding which one should take over can be done manually
or through a monitoring + election script.


Marco Mariani's avatar
Marco Mariani committed
62
You should provide the bin/{mysoftware}-importer script, see for instance
63

64
  https://lab.nexedi.com/nexedi/slapos/blob/1.0.142/slapos/recipe/mydumper.py#L71
65 66 67



Marco Mariani's avatar
Marco Mariani committed
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

In practice
-----------

Add resilience to your software

Let's say you already have a file instance-mysoftware.cfg.in that instantiates your
software. In which there is a part [mysoftware] where there is the main recipe
that instantiates the program.

You need to create two new files, instance-mysoftware-import.cfg.in and
instance-mysoftware-export.cfg.in, following this layout:


IMPORT:

[buildout]
extends = ${instance-mysoftware:output}
          ${pbsready-import:output}

parts +=
    mysoftware
90
    notify-callback
Marco Mariani's avatar
Marco Mariani committed
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120

[importer]
recipe = YourImportRecipe
wrapper = $${rootdirectory:bin}/$${slap-parameter:namebase}-importer
backup-directory = $${directory:backup}
...



EXPORT:

[buildout]
extends = ${instance-mysoftware:output}
          ${pbsready-export:output}

parts +=
    mysoftware
    cron-entry-backup

[exporter]
recipe = YourExportRecipe
wrapper = $${rootdirectory:bin}/$${slap-parameter:namebase}-exporter
backup-directory = $${directory:backup}
...


In the [exporter] / [importer] part, you are free to do whatever you want, but
you need to dump / import your data from $${directory:backup} and specify a
wrapper. I suggest you only add options and specify your export/import recipe.

121

Marco Mariani's avatar
Marco Mariani committed
122 123 124 125 126 127 128


Checking that it works
----------------------

To check that your software instance is resilient you can proceed this way:
Once all instances are successfully deployed, go to your export instance, connect as the instance user and run:
129
$ ~/bin/exporter
Marco Mariani's avatar
Marco Mariani committed
130
It is the script responsible for triggering the resiliency stack on your instance. After doing a backup of your data, it will notify the pull-backup instances of a new backup, triggering the transfer of this data to the import instances.
131

Marco Mariani's avatar
Marco Mariani committed
132
Once this script is run successfully, go to your import instance, connect as its instance user and check ~/srv/backup/"your sofwtare"/, the location of the data you wanted to receive. The last part of the resiliency is up to your import script.
133 134

DEBUGGING:
Marco Mariani's avatar
Marco Mariani committed
135 136 137
Here is a partial list of things you can check to understand what is causing the problem:

- Check that your import script does not fail and successfully places your data in ~/srv/backup/"your software" (as the import instance user) by runnig:
138 139 140 141
$ ~/bin/"your software"-exporter
- Check the export instance script is run successfully as this instance user by running:
$ ~/bin/exporter
- Check the pull-instance system did its job by going to one of your pull-backup instance, connect as its user and check the log : ~/var/log/equeue.log
Marco Mariani's avatar
Marco Mariani committed
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209


-----------------------------------------------------------------------------------------

Finally, instance-mysoftware-import.cfg.in and
instance-mysoftware-export.cfg.in need to be downloaded and accessible by
switch_softwaretype, and you need to extend stack/resilient/buildout.cfg and
stack/resilient/switchsoftware.cfg to download the whole resiliency bundle.

Here is how it's done in the mariadb case for the lamp stack:



 ** buildout.cfg **

extends =
   ../resilient/buildout.cfg

[instance-mariadb-import]
recipe = slapos.recipe.template
url = ${:_profile_base_location_}/mariadb/instance-mariadb-import.cfg.in
output = ${buildout:directory}/instance-mariadb-import.cfg
md5sum = ...
mode = 0644

[instance-mariadb-export]
recipe = slapos.recipe.template
url = ${:_profile_base_location_}/mariadb/instance-mariadb-export.cfg.in
output = ${buildout:directory}/instance-mariadb-export.cfg
md5sum = ...
mode = 0644



 ** instance.cfg.in **

extends =
  ../resilient/switchsoftware.cfg

[switch-softwaretype]
...
mariadb = ${instance-mariadb:output}
mariadb-import = ${instance-mariadb-import:output}
mariadb-export = ${instance-mariadb-export:output}
...



Then, in the .cfg file where you want to instantiate your software, you can do, instead of requesting your software

 * template-resilient.cfg.in *

[buildout]
...
parts +=
  {{ parts.replicate("Name","3") }}
  ...

[...]
...
[ArgLeader]
...

[ArgBackup]
...

{{ replicated.replicate("Name", "3",
                        "mysoftware-export", "mysoftware-import",
210
                        "ArgLeader","ArgBackup", slapparameter_dict=slapparameter_dict) }}
Marco Mariani's avatar
Marco Mariani committed
211 212 213

and it'll expend into the sections require to request Name0, Name1 and Name2,
backuped and resilient. The leader will expend the section [ArgLeader], backups
214 215
will expend [ArgBackup]. slapparameter_dict is the dict containing the parameters given to the instance. If you don't need to specify any options, you can
omit the last three arguments in replicate().
Marco Mariani's avatar
Marco Mariani committed
216 217 218 219 220 221 222

Since you will compile your template with jinja2, there should be no $${},
because it is not yet possible to use jinja2 -> buildout template.

To compile with jinja2, see jinja2's recipe.


223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
Deploying your resilient software
---------------------------------
You can provide sla parameters to each request you make (a lot: for export, import and pbs).

example:
Here is a small example of parameters you can provide to control the deployment (case of a runner):
<?xml version='1.0' encoding='utf-8'?>
<instance>
  <parameter id="-sla-1-computer_guid">COMP-GRP1</parameter>
  <parameter id="-sla-pbs1-computer_guid">COMP-PBS1</parameter>
  <parameter id="-sla-2-computer_guid">COMP-GROUP2</parameter>
  <parameter id="-sla-runner2-computer_guid">COMP-RUN2</parameter>
  <parameter id="-sla-2-network_guid">NET-2</parameter>
  <parameter id="-sla-runner0-computer_guid">COMP-RUN0</parameter>
</instance>
Consequence on sla parameters by request:
  * runner0: computer_guid = COMP-RUN0 (provided directly)
  * runner1: computer_guid = COMP-GRP1 (provided by group 1)
  * runner2: computer_guid = COMP-RUN2 (provided by group 2 but overided directly)
             network_guid  = NET-2 (provided by group 2)
  * PBS 1:   computer_guid = COMP-PBS1 (provided by group 1 but overided directly)
  * PBS 2:   computer_guid = COMP-GRP2 (provided by group 2)
             network_guid  = NET-2 (provided by group 2)


Parameters are analysed this way:
 * If it starts with "-sla-" it is not transmitted to requested instance and is used to do the request as sla.
 * -sla-foo-bar=example (foo being a magic key) will be use for each request considering "foo" as a key to use and the sla parameter is "bar". So for each group using the "foo" key, sla parameter "bar" is used with value "example"
About magic keys:
We can find 2 kinds of magic keys:
 * id : example, in "-sla-2-foo" 2 is the magic key and the parameter will be used for each request with id 2 (in case of kvm: kvm2 and PBS 2)
 * nameid : example, in "-sla-kvm2-foo", foo will be used for kvm2 request. Name for pbs is "pbs" -> "-sla-pbs2-foo".
IMPORTANT NOTE: in case the same foo parameter is asked for the group, the nameid key prevail