Commit 8a599ee2 authored by Vincent Pelletier's avatar Vincent Pelletier

Add some replication & tpc_finish TODOs.

git-svn-id: 71dcc9de-d417-0410-9af5-da40c76e7ee4
parent 426c2ebb
......@@ -132,6 +132,38 @@ RC - Review output of pylint (CODE)
be split in chunks and processed in "background" on storage nodes.
Packing throttling should probably be at the lowest possible priority
(below interactive use and below replication).
- Replication throttling (HIGH AVAILABILITY)
Replication should not prevent clients from accessing storage node with
good responsiveness.
See "Replication pipelining".
- Replication pipelining (SPEED)
Replication work currently with too many exchanges between replicating
storage, and network latency can become a significant limit.
This should be changed to have just one initial request from
replicating storage, and multiple packets from reference storage with
database range checksums. When receiving these checksums, replicating
storage must compare with what it has, and ask row lists (might not even
be required) and data when there are differences. Quick fetching from
network with asynchronous checking (=queueing) + congestion control
(asking reference storage's to pause its packet flow) will probably be
This should make it easier to throttle replication workload on reference
storage node, as it can decide to postpone replication-related packets on
its own.
- Partial replication (SPEED)
In its current implementation, replication always happens on a whole
partition. In typical use, only a few last transactions will have been
missed, so replicating only past a given TID would be much faster.
To achieve this, storage nodes must store 2 values:
- a pack identifier, which must be different each time a pack occurs
(increasing number sequence, TID-ish, etc) to trigger a
whole-partition replication when a pack happened (this could be
improved too, later)
- the latest (-ish) transaction committed locally, to use as a lower
replication boundary
- tpc_finish failures propagation to master (FUNCTIONALITY)
When asked to lock transaction data, if something goes wrong the master
node must be informed.
- Master node data redundancy (HIGH AVAILABILITY)
......@@ -161,6 +193,9 @@ RC - Review output of pylint (CODE)
instead of parsing the whole partition table. (SPEED)
- Improve partition table tweaking algorithm to reduce differences between
frequently and rarely used nodes (SCALABILITY)
- tpc_finish failures propagation to client (FUNCTIONALITY)
When a storage node notifies a problem during lock/unlock phase, an error
must be propagated to client.
- Implement C version of (LOAD LATENCY)
......@@ -182,6 +217,8 @@ RC - Review output of pylint (CODE)
- Cache for loadSerial/loadBefore
- Implement restore() ZODB API method to bypass consistency checks during
- tpc_finish failures (FUNCTIONALITY)
New failure cases during tpc_finish must be handled.
- Consider auto-generating cluster name upon initial startup (it might
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment