Open

Created Jan 14, 2026 by Cédric Le Ninivin @cedric.leninivinDeveloper10 of 15 tasks completed10/15 tasks

Report abuse

Draft: rapid-cdn: move to instance node managment with local instance database

Overview 9
Commits 78
Pipelines 6
Changes 105

Introduction

This Merge request is the follow up of !1947 (closed) that was abandoned as SlapOS Master doesn't support having Thousands of instances requested by the same instance inside the same instance tree. To solve the issue it was decided to integrate the work directly in the rapid-cdn SR

Why

This merge request introduces CDN Requester. The initial goal of this tool is to separate the technical CDN from the management of CDN requests that are related to sales. This comes from the need to validate domain name ownership and move towards allowing any customer to use the CDN without risks.

Once this SR is released, the goal is to progressively move all existing CDN instances to this new SR, with the Instance Node available in any project.

The first implementation tried to implement the management of instances in the Instance node (also called slave instance) in the traditional way. But the constraints of this approach are too well known when the number of instances rises, and it was decided to move to an implementation fully in Python to ensure scalability.

Requirements

Be able to host a large amount of instances
- Instance list should not be stored in buildout (traces in log and large reprocessing)
- Only reprocess what needs to be reprocessed
Garbage collect: If an instance is destroyed or stopped by the user then and only then the CDN entry should also be removed
Persistence of DNS validation: Once a domain has been validated for an instance there is no need to revalidate it on each call

Consideration about instance node

Instances in the Instance Node are independent from the instance hosting them. To understand that, you need to see that two actors are involved when using an Instance Node:

The Instance Operator: The one managing the host instance
The users requesting instances on the instance node. Also called "Shared"/"Slave" instances.

In SlapOS we need to be able to alert each user only when necessary: which actor should be called to solve the issue. Just like we don't inform the compute node operator if one instance is failing.

As an example, if an instance hosted on the instance node is failing because the user inputs incorrect parameters, or because the DNS validation is not done yet, or because it has not finished deploying on the CDN, only the user of that instance should be informed and no alert should be done on the Instance Node as it has no impact on it from the instance operator's point of view.

Conclusion

In the long term, an Instance Node should be independent in its processing of the instances. Eventually there should be no need to reprocess the host instance if a new instance is allocated on it (not yet available with the current API).

It should reproduce the good practices of slapos node instance:

Only process instances that need to be processed
Have a promise validate the instance deployment
Reprocess until the promise passes.

Implementation

LocalInstanceDB

Introduce a generic Recipe (slapos/recipe/localinstancedb.py) to:

Offer a generic class to manage SQLite database
Store the list of instances
Compare list of instances to calculate what is:
- New
- Modified
- Removed

The InstanceListComparator class performs hash-based comparison: it computes SHA256 hashes of JSON-serialized parameters (sorted keys) to efficiently detect changes without full comparison. This prevents unnecessary reprocessing when only validation status changes.

slapconfiguration

Add a new entry point slapconfiguration.jsonschema.localdb to store the list of instances in a local database. Store the validation state of the parameters as this entry point inherits the JSONSchema one that validates the instance parameters against the instance schema.

The entry point writes validated instances to the database at instance-db-path, storing both valid and invalid instances with their validation results. This database is then read by the Instance Node to determine what needs processing.

Instance Node

First attempt at the implementation of an instance node (slapos/recipe/instancenode.py). The initial implementation has been done so that it can be used as a recipe, but it was later extended to be used as a script called every minute by cron.

Main loop

Here are the main steps:

Get the list of instances to process from the master (stored in the database filled by slapconfiguration at instance-db-path)
Compare it to the list of instances we processed (stored at requestinstance-db-path) to see:
- New instances
- Modified instances
- Removed instances
Get the list of instances that need reprocessing (instances marked as invalid in the database that haven't been modified or removed)
For each instance in: new, modified, need reprocess.
- Process the instance
For removed instances
- Destroy instance

The comparison uses InstanceListComparator which compares hashes to detect parameter changes. Instances are only considered modified if their parameter hash changed.

Instance Processing

The processing of an instance is as follows:

Check the result if parameters are validated against JSON Schema (done by slapconfiguration and stored in the DB)
validateInstancePreDeployment: extra validation of parameters. For CDN Requester this is where we check DNS
Deploy instance
validate Instance Post Deployment (Promise)
Mark the instance as okay

If at any step the validation fails, the processing stops and the error is returned to the user by publishing it.

Any step can be overridden by inheriting the InstanceNode class. This is what the CDN Requester does to perform specialized validation for the CDN request.

Connection parameters are only published if they differ from what's already stored in the database, avoiding unnecessary updates.

Instance destruction

This method allows ensuring clean destruction of the instance before removing it from our list of instances. By default it calls request the CDN instance in destroyed state to properly clean up the instance on the master before removing it from the local database.

Usage as script

Some tools have been added here to parse a config file, run with a PID file to avoid concurrency issues and have proper log configuration.

CDN Request

This class (slapos/recipe/cdnrequest.py) inherits Instance node and specializes:

Prevalidation:
- Domain validation: checks domain ownership via DNS TXT record containing validation token
- Domain uniqueness: ensures domain is not already validated for another instance
- DNS resolution: uses DNS resolver (dnspython) with fresh cache at initialization (to bypass server DNS cache)
- Host tracking: stores validated domains and used hosts in DomainValidationDB with tables:
  - domain_validation: stores instance_reference, domain, token, validated, timestamp
  - used_hosts: stores host, instance_reference pairs to track host assignments. This ensure an alias cannot be added for an already validated domain.
Destruction: removes domain validation entries and frees hosts when instance is destroyed

The CDN Request recipe also validates parameters like in rapid-cdn.

A new constraint has been added on server-alias for them to be a subdomain of the custom_domain to avoid multiplying domain ownership verification.

Notes from initial discussion:

So domain name validation should be done in a separate SR that request through a remote node to the Node.

SR CDN is purely technical now called "technical CDN". New SR is business is doing a lot a validation.

Premium CDN existing Shared instance are all ported to new SR.

Remove everyone from premium CDN project aside from its operators.

New SR can use Jinja with Buildout.

At the moment the Business CDN SR will only do the Domain name validation.

XXX Maybe we want to sign the parameters provided by the Business CDN to make sure we trust the source. After cleaning up the current data.

For each shared instance on the Business CDN it request a Shared Instance on the technical CDN

TODO

TODO Before release

Garbage collect of stopped or destroyed slave???
Check domain name via CDN via software PY
Do not request to CDN with default parameters
Add clear parameters for computer uid / instance uid for sla
Add Egg tests
Use JSON parameters
Add subroutine to check domain validation. / Non Slapos dependent checks.
Request 1000 share instance to test scalability and garbage collection
Add promise for failing instance to help operator debug
Clean up connection parameters and parameters for slaves sent and received from master
Add SR tests
Add bang to instance node deploy else the master node is not reprocessed.

TODO after release (to be validated)

slapconfiguration jsonschema slaves: Master doesn't send the SR of the slave. How do you check schema.
How to create ticket to inform user its instance is failing
Resiliency of databases?? What happens if the list of validated domain is lost?

Edited Jan 15, 2026 by Cédric Le Ninivin

Assignee

Assign to

Reviewer

Request review from

Time tracking

Source branch: rapid-cdn-instancenode