README.rst 8.96 KB

Base resilient stack

This stack is meant to be extended by SR profiles, or other stacks, that need to provide automated backup/restore, election of backup candidates, and instance failover.

As reference implementations, both stack/lamp and stack/lapp define resilient behavior for MySQL and Postgres respectively.

This involves three different software_types:

  • pull-backup
  • {mysoftware}_export
  • {mysoftware}_import

where 'mysoftware' is the component that needs resiliency (can be postgres, mysql, erp5, and so on).


This software type is defined in

and there should be no reason to modify or extend it.

An instance of type 'pull-backup' will receive data from an 'export' instance and immediately populate an 'import' instance. The backup data is automatically used to build an historical, incremental archive in srv/backup/pbs.



This is the active instance - the one providing live data to the application.

A backup is run via the bin/exporter script: it will
  1. run bin/{mysoftware}-backup

and 2) notify the pull-backup instance that data is ready.

The pull-backup, upon receiving the notification, will make a copy of the data and transmit it to the 'import' instances.

You should provide the bin/{mysoftware}-exporter script, see for instance
By default, as defined in

the bin/exporter script is run every 60 minutes.



This is the fallback instance - the one that can be activated and thus become active. Any number of import instances can be used. Deciding which one should take over can be done manually or through a monitoring + election script.

You should provide the bin/{mysoftware}-importer script, see for instance

In practice

Add resilience to your software

Let's say you already have a file that instantiates your software. In which there is a part [mysoftware] where there is the main recipe that instantiates the program.

You need to create two new files, and, following this layout:


[buildout] extends = ${instance-mysoftware:output}

parts +=
mysoftware import-on-notification

[importer] recipe = YourImportRecipe wrapper = $${rootdirectory:bin}/$${slap-parameter:namebase}-importer backup-directory = $${directory:backup} ...


[buildout] extends = ${instance-mysoftware:output}

parts +=
mysoftware cron-entry-backup

[exporter] recipe = YourExportRecipe wrapper = $${rootdirectory:bin}/$${slap-parameter:namebase}-exporter backup-directory = $${directory:backup} ...

In the [exporter] / [importer] part, you are free to do whatever you want, but you need to dump / import your data from $${directory:backup} and specify a wrapper. I suggest you only add options and specify your export/import recipe.

Checking that it works

To check that your software instance is resilient you can proceed this way: Once all instances are successfully deployed, go to your export instance, connect as the instance user and run: $ ~/bin/exporter It is the script responsible for triggering the resiliency stack on your instance. After doing a backup of your data, it will notify the pull-backup instances of a new backup, triggering the transfer of this data to the import instances.

Once this script is run successfully, go to your import instance, connect as its instance user and check ~/srv/backup/"your sofwtare"/, the location of the data you wanted to receive. The last part of the resiliency is up to your import script.

DEBUGGING: Here is a partial list of things you can check to understand what is causing the problem:

  • Check that your import script does not fail and successfully places your data in ~/srv/backup/"your software" (as the import instance user) by runnig:

$ ~/bin/"your software"-exporter - Check the export instance script is run successfully as this instance user by running: $ ~/bin/exporter - Check the pull-instance system did its job by going to one of your pull-backup instance, connect as its user and check the log : ~/var/log/equeue.log

Finally, and need to be downloaded and accessible by switch_softwaretype, and you need to extend stack/resilient/buildout.cfg and stack/resilient/switchsoftware.cfg to download the whole resiliency bundle.

Here is how it's done in the mariadb case for the lamp stack:

** buildout.cfg **
extends =

[instance-mariadb-import] recipe = slapos.recipe.template url = ${:_profile_base_location_}/mariadb/ output = ${buildout:directory}/instance-mariadb-import.cfg md5sum = ... mode = 0644

[instance-mariadb-export] recipe = slapos.recipe.template url = ${:_profile_base_location_}/mariadb/ output = ${buildout:directory}/instance-mariadb-export.cfg md5sum = ... mode = 0644

** **
extends =

[switch-softwaretype] ... mariadb = ${instance-mariadb:output} mariadb-import = ${instance-mariadb-import:output} mariadb-export = ${instance-mariadb-export:output} ...

Then, in the .cfg file where you want to instantiate your software, you can do, instead of requesting your software

  • *

[buildout] ... parts +=

{{ parts.replicate("Name","3") }} ...

[...] ... [ArgLeader] ...

[ArgBackup] ...

{{ replicated.replicate("Name", "3",
"mysoftware-export", "mysoftware-import", "ArgLeader","ArgBackup", slapparameter_dict=slapparameter_dict) }}

and it'll expend into the sections require to request Name0, Name1 and Name2, backuped and resilient. The leader will expend the section [ArgLeader], backups will expend [ArgBackup]. slapparameter_dict is the dict containing the parameters given to the instance. If you don't need to specify any options, you can omit the last three arguments in replicate().

Since you will compile your template with jinja2, there should be no $${}, because it is not yet possible to use jinja2 -> buildout template.

To compile with jinja2, see jinja2's recipe.

Deploying your resilient software

You can provide sla parameters to each request you make (a lot: for export, import and pbs).

example: Here is a small example of parameters you can provide to control the deployment (case of a runner): <?xml version='1.0' encoding='utf-8'?> <instance>

<parameter id="-sla-1-computer_guid">COMP-GRP1</parameter> <parameter id="-sla-pbs1-computer_guid">COMP-PBS1</parameter> <parameter id="-sla-2-computer_guid">COMP-GROUP2</parameter> <parameter id="-sla-runner2-computer_guid">COMP-RUN2</parameter> <parameter id="-sla-2-network_guid">NET-2</parameter> <parameter id="-sla-runner0-computer_guid">COMP-RUN0</parameter>

</instance> Consequence on sla parameters by request:

  • runner0: computer_guid = COMP-RUN0 (provided directly)

  • runner1: computer_guid = COMP-GRP1 (provided by group 1)

  • runner2: computer_guid = COMP-RUN2 (provided by group 2 but overided directly)

    network_guid = NET-2 (provided by group 2)

  • PBS 1: computer_guid = COMP-PBS1 (provided by group 1 but overided directly)

  • PBS 2: computer_guid = COMP-GRP2 (provided by group 2)

    network_guid = NET-2 (provided by group 2)

Parameters are analysed this way:
  • If it starts with "-sla-" it is not transmitted to requested instance and is used to do the request as sla.
  • -sla-foo-bar=example (foo being a magic key) will be use for each request considering "foo" as a key to use and the sla parameter is "bar". So for each group using the "foo" key, sla parameter "bar" is used with value "example"

About magic keys: We can find 2 kinds of magic keys:

  • id : example, in "-sla-2-foo" 2 is the magic key and the parameter will be used for each request with id 2 (in case of kvm: kvm2 and PBS 2)
  • nameid : example, in "-sla-kvm2-foo", foo will be used for kvm2 request. Name for pbs is "pbs" -> "-sla-pbs2-foo".

IMPORTANT NOTE: in case the same foo parameter is asked for the group, the nameid key prevail