Goal
Audit the current state of CDN slaves by validating their parameters against the expected JSON schema. This enables identifying entries that need adjustment and starts requesting users to update their CDN entries (especially for custom domains requiring DNS proof-of-ownership).
Passive & transparent
This MR does not change how slaves are processed. The existing slave deployment pipeline is untouched — slaves continue to be requested and deployed exactly as before. The new validation layer runs alongside it, feeding results into a local SQLite database that CDN operators can query (e.g. via sqlite-web) to audit slave health. No slave is rejected, blocked, or modified by this change.
Roadmap
This MR is step 1 of a 3-step plan:
- Audit & notify (this MR) — validate slaves passively, store results in a local DB for operator audit, ask users to update invalid CDN entries for custom domains
- Raise tickets — automatically create tickets for invalid slaves
- Enforce — reject invalid slaves entirely
Architecture
This MR introduces a validation pipeline backed by local SQLite databases, running alongside the existing slave deployment.
New components
- LocalInstanceDB (slapos/recipe/localinstancedb.py): SQLite-backed database for persisting instance lists, tracking parameter hashes to detect changes, and comparing instance states (added/modified/removed).
- slapconfiguration with JSON Schema validation (slapos/recipe/slapconfiguration.py): New JsonSchemaWithDB variants that validate slave parameters against a JSON schema and store results (valid/invalid + errors) in the local database.
- Instance Node (slapos/recipe/instancenode.py): Core processing engine that reads the validated instance database, computes deltas against previously deployed instances, and runs a per-instance lifecycle: a. Schema validation check (from slapconfiguration phase) b. Pre-deploy validation (extensible hook) c. Deployment d. Post-deploy validation (extensible hook) e. Publish results or errors to master
- CDN Instance Node (slapos/recipe/cdninstancenode.py): CDN-specific extension that adds:
- DNS domain ownership verification: for custom domains, generates an HMAC challenge and checks for a _slapos-challenge TXT record proving the domain owner controls DNS. This ensures only legitimate domain owners can register custom domains on the CDN.
- SSL certificate validation: verifies that existing certificates match the requested domain and are not expired.
- Server-alias conflict detection: detects when multiple slaves claim the same domain.
- Does not request instances to master — validates locally and triggers reprocessing via bang when validation state changes.
Crontab execution
The CDN Instance Node runs periodically via crontab, independently of slapgrid cycles. DNS challenge validation is not triggered by changes to slave parameters (which would cause a slapgrid reprocessing), so a crontab ensures that newly validated DNS challenges are picked up and processed without waiting for a parameter change.
Data flow
SlapOS Master (slave-instance-list) │ ▼ slapconfiguration (JSON Schema validation) │ stores valid/invalid + errors ▼ Instance DB (SQLite) ◄── queryable by operator (sqlite-web) │ ▼ CDN Instance Node (crontab) │ DNS challenge, SSL check, conflict detection ▼ Request DB (SQLite) ◄── queryable by operator (sqlite-web) │ ▼ SlapOS Master (publish errors / connection params / bang)
TODO
TODO Before release
-
Garbage collect of stopped or destroyed slave??? -
Add Egg tests -
Add subroutine to check domain validation. / Non Slapos dependent checks. -
Request 1000 share instance to test scalability and garbage collection -
Add promise for failing instance to help operator debug -
Clean up connection parameters and parameters for slaves sent and received from master -
Add SR tests -
Add bang to instance node deploy else the master node is not reprocessed.
TODO after release (to be validated)
-
slapconfiguration jsonschema slaves: Master doesn't send the SR of the slave. How do you check schema. -
Have a way for Hosted Instance to access error as ticket has no space. Maybe only publish parameter? -
Resiliency of databases?? What happens if the list of validated domain is lost?