-- BACKUP SIGNAL DIAGRAM COMPLEMENT TO BACKUP AMENDMENTS 2003-07-11 -- USER MASTER MASTER SLAVE SLAVE --------------------------------------------------------------------- BACKUP_REQ ----------------> UTIL_SEQUENCE ---------------> <--------------- DEFINE_BACKUP ------------------------------> (Local signals) LIST_TABLES ---------------> <--------------- FSOPEN ---------------> GET_TABINFO <--------------- DI_FCOUNT ---------------> <--------------- DI_GETPRIM ---------------> <--------------- <------------------------------- BACKUP_CONF <---------------- CREATE_TRIG --------------> (If master crashes here -> rouge triggers/memory leak) <-------------- START_BACKUP ------------------------------> <------------------------------ ALTER_TRIG --------------> <-------------- WAIT_GCP --------------> <-------------- BACKUP_FRAGMENT ------------------------------> SCAN_FRAG ---------------> <--------------- <------------------------------ WAIT_GCP --------------> <-------------- DROP_TRIG --------------> <-------------- STOP_BACKUP ------------------------------> <------------------------------ BACKUP_COMPLETE_REP <---------------- ABORT_BACKUP ------------------------------> ---------------------------------------------------------------------------- USER BACKUP-MASTER 1) BACKUP_REQ --> 2) To all slaves DEFINE_BACKUP_REQ This signals contains info so that all slaves can take over as master Tomas: Except triggerId info... 3) Wait for conf 4) <-- BACKUP_CONF 5) For Each Table PREP_CREATE_TRIG_REQ Wait for Conf 6) To all slaves START_BACKUP_REQ Include trigger ids Wait for conf 7) For Each Table CREATE_TRIG_REQ Wait for conf 8) Wait for GCP 9) For each table For each fragment BACKUP_FRAGMENT_REQ --> <-- BACKUP_FRAGMENT_CONF 10) Wait for GCP 11) To all slaves STOP_BACKUP_REQ This signal turns off logging 12) Wait for conf 13) <-- BACKUP_COMPLETE_REP ---- Slave: Master Died Wait for master take-over, max 30 sec then abort everything Slave: Master TakeOver BACKUP_STATUS_REQ --> To all nodes <-- BACKUP_STATUS_CONF BACKUP_STATUS_CONF BACKUP_DEFINED BACKUP_STARTED BACKUP_FRAGMENT Master: Slave died -- Define Backup Req -- 1) Get backup definition Which tables (all) 2) Open files Write table list to CTL - file 3) Get definitions for all tables in backup 4) Get Fragment info 5) Define Backup Conf -- Define Backup Req -- -- Abort Backup Req -- 1) Report to others 2) Stop logging 3) Stop file(s) 4) Stop scan 5) If failure/abort Remove files 6) If XXX Report to user 7) Clean up records/stuff -- Abort Backup -- Reasons for aborting: 1a) client abort 1b) slave failure 1c) node failure Resources to be cleaned up: Slave responsability: 2a) Close and remove files 2b) Free allocated resources Master responsability: 2c) Drop triggers USER MASTER MASTER SLAVE SLAVE --------------------------------------------------------------------- BACKUP_ABORT_ORD: -------------------------(ALL)--> Set Master State ABORTING Set Slave State ABORTING Drop Triggers Close and Remove files CleanupSlaveResources() BACKUP_ABORT_ORD:OkToClean -------------------------(ALL)--> CleanupMasterResources() BACKUP_ABORT_REP <--------------- State descriptions: Master - INITIAL BACKUP_REQ -> Master - DEFINING DEFINE_BACKUP_CONF -> Master - DEFINED CREATE_TRIG_CONF -> Master - STARTED <---> Master - SCANNING WAIT_GCP_CONF -> Master - STOPPING (Master - CLEANING) -------- Master - ABORTING Slave - INITIAL DEFINE_BACKUP_REQ -> Slave - DEFINING - backupId - tables DIGETPRIMCONF -> Slave - DEFINED START_BACKUP_REQ -> Slave - STARTED Slave - SCANNING STOP_BACKUP_REQ -> Slave - STOPPING FSCLOSECONF -> Slave - CLEANING ----- Slave - ABORTING Testcases: 2. Master failure at first START_BACKUP_CONF <masterId> error 10004 start backup - Ok 2. Master failure at first CREATE_TRIG_CONF <masterId> error 10003 start backup - Ok 2. Master failure at first ALTER_TRIG_CONF <masterId> error 10005 start backup - Ok 2. Master failure at WAIT_GCP_CONF <masterId> error 10007 start backup - Ok 2. Master failure at WAIT_GCP_CONF, nextFragment <masterId> error 10008 start backup - Ok 2. Master failure at WAIT_GCP_CONF, stopping <masterId> error 10009 start backup - Ok 2. Master failure at BACKUP_FRAGMENT_CONF <masterId> error 10010 start backup - Ok 2. Master failure at first DROP_TRIG_CONF <masterId> error 10012 start backup - Ok 1. Master failure at first STOP_BACKUP_CONF <masterId> error 10013 start backup - Ok 3. Multiple node failiure: <masterId> error 10001 <otheId> error 10014 start backup - Ok (note, mgmtsrvr does gets BACKUP_ABORT_REP but expects BACKUP_REF, hangs...) 4. Multiple node failiure: <masterId> error 10007 <takeover id> error 10002 start backup - Ok ndbrequire(!ERROR_INSERTED(10001)); ndbrequire(!ERROR_INSERTED(10002)); ndbrequire(!ERROR_INSERTED(10021)); ndbrequire(!ERROR_INSERTED(10003)); ndbrequire(!ERROR_INSERTED(10004)); ndbrequire(!ERROR_INSERTED(10005)); ndbrequire(!ERROR_INSERTED(10006)); ndbrequire(!ERROR_INSERTED(10007)); ndbrequire(!ERROR_INSERTED(10008)); ndbrequire(!ERROR_INSERTED(10009)); ndbrequire(!ERROR_INSERTED(10010)); ndbrequire(!ERROR_INSERTED(10011)); ndbrequire(!ERROR_INSERTED(10012)); ndbrequire(!ERROR_INSERTED(10013)); ndbrequire(!ERROR_INSERTED(10014)); ndbrequire(!ERROR_INSERTED(10015)); ndbrequire(!ERROR_INSERTED(10016)); ndbrequire(!ERROR_INSERTED(10017)); ndbrequire(!ERROR_INSERTED(10018)); ndbrequire(!ERROR_INSERTED(10019)); ndbrequire(!ERROR_INSERTED(10020)); if (ERROR_INSERTED(10023)) { if (ERROR_INSERTED(10023)) { if (ERROR_INSERTED(10024)) { if (ERROR_INSERTED(10025)) { if (ERROR_INSERTED(10026)) { if (ERROR_INSERTED(10028)) { if (ERROR_INSERTED(10027)) { (ERROR_INSERTED(10022))) { if (ERROR_INSERTED(10029)) { if(trigPtr.p->operation->noOfBytes > 123 && ERROR_INSERTED(10030)) { ----- XXX --- DEFINE_BACKUP_REF -> ABORT_BACKUP_ORD(no reply) when all DEFINE_BACKUP replies has arrived START_BACKUP_REF ABORT_BACKUP_ORD(no reply) when all START_BACKUP_ replies has arrived BACKUP_FRAGMENT_REF ABORT_BACKUP_ORD(reply) directly to all nodes running BACKUP_FRAGMENT When all nodes has replied BACKUP_FRAGMENT ABORT_BACKUP_ORD(no reply) STOP_BACKUP_REF ABORT_BACKUP_ORD(no reply) when all STOP_BACKUP_ replies has arrived NF_COMPLETE_REP slave dies master sends OUTSTANDING_REF to self slave does nothing master dies slave elects self as master and sets only itself as participant