product/ERP5/Tool/TaskDistributionTool.py · 8c90e61cf17bd7e929ef9968a4fa6021d13e3339 · nexedi / erp5

Up to now, once all test result lines in draft were processed, test result lines already started where affected to all test nodes. It was designed like this in case the initial affected test node was unable to finish is work (test node or machine could die for various reasons). But having a testnode dying should be rare, thus optimisation should not consider that this happens all the time, even though we must take into account that this could happen. This was leading to cases where a testnode, instead of quiting a test suite to process another was affected a test already affected. So it happened that we loosed one hour of a testnode while it could do much more useful work than repeating the work of another testnode. Thus, consider that testnodes are usually able to process their work, and make testnodes immediately work on another test suite once all tests of a test result are started. Then, run regularly an alarm looking for stuck test to restart them in order to affect work already affected only when required. This change should make the system more reactive when things are working (wich is the majority of time). Not working cases would still finish to work, but in a less reactive way. If we wait urgently for a test result and we see that a test is stuck, there is also possibility to unblock it by hand (if we do not want to wait the alarm).

TaskDistributionTool.py 12 KB

Replace TaskDistributionTool.py