Commit f8f1ba58 authored by Grégory Wisniewski's avatar Grégory Wisniewski

Remove trailing whitespaces.

git-svn-id: https://svn.erp5.org/repos/neo/trunk@1915 71dcc9de-d417-0410-9af5-da40c76e7ee4
parent 05739b87
...@@ -18,7 +18,7 @@ RC - write ZODB-API-level tests ...@@ -18,7 +18,7 @@ RC - write ZODB-API-level tests
Code Code
Code changes often impact more than just one node. They are categorised by Code changes often impact more than just one node. They are categorised by
node where the most important changes are needed. node where the most important changes are needed.
General General
...@@ -26,28 +26,28 @@ RC - Review XXX in the code (CODE) ...@@ -26,28 +26,28 @@ RC - Review XXX in the code (CODE)
RC - Review TODO in the code (CODE) RC - Review TODO in the code (CODE)
RC - Review output of pylint (CODE) RC - Review output of pylint (CODE)
- Keep-alive (HIGH AVAILABILITY) - Keep-alive (HIGH AVAILABILITY)
Consider the need to implement a keep-alive system (packets sent Consider the need to implement a keep-alive system (packets sent
automatically when there is no activity on the connection for a period automatically when there is no activity on the connection for a period
of time). of time).
- Factorise packet data when sending partition table cells (BANDWITH) - Factorise packet data when sending partition table cells (BANDWITH)
Currently, each cell in a partition table update contains UUIDs of all Currently, each cell in a partition table update contains UUIDs of all
involved nodes. involved nodes.
It must be changed to a correspondance table using shorter keys (sent It must be changed to a correspondance table using shorter keys (sent
in the packet) to avoid repeating the same UUIDs many times. in the packet) to avoid repeating the same UUIDs many times.
- Make IdleEvent know what message they are expecting (DEBUGABILITY) - Make IdleEvent know what message they are expecting (DEBUGABILITY)
If a PING packet is sent, there is currently no way to know which If a PING packet is sent, there is currently no way to know which
request created associated IdleEvent, nor which response is expected request created associated IdleEvent, nor which response is expected
(knowing either should be enough). (knowing either should be enough).
- Consider using multicast for cluster-wide notifications. (BANDWITH) - Consider using multicast for cluster-wide notifications. (BANDWITH)
Currently, multi-receivers notifications are sent in unicast to each Currently, multi-receivers notifications are sent in unicast to each
receiver. Multicast should be used. receiver. Multicast should be used.
- Remove sleeps (LATENCY, CPU WASTE) - Remove sleeps (LATENCY, CPU WASTE)
Code still contains many delays (explicit sleeps or polling timeouts). Code still contains many delays (explicit sleeps or polling timeouts).
They must be removed to be either infinite (sleep until some condition They must be removed to be either infinite (sleep until some condition
becomes true, without waking up needlessly in the meantime) or null becomes true, without waking up needlessly in the meantime) or null
(don't wait at all). (don't wait at all).
- Implements delayed connection acceptation. - Implements delayed connection acceptation.
Currently, any node that connects to early to another that is busy for Currently, any node that connects to early to another that is busy for
some reasons is immediately rejected with the 'not ready' error code. This some reasons is immediately rejected with the 'not ready' error code. This
should be replaced by a queue in the listening node that keep a pool a should be replaced by a queue in the listening node that keep a pool a
nodes that will be accepted late, when the conditions will be satisfied. nodes that will be accepted late, when the conditions will be satisfied.
...@@ -57,34 +57,34 @@ RC - Review output of pylint (CODE) ...@@ -57,34 +57,34 @@ RC - Review output of pylint (CODE)
Masters implies in the election process should still reject any connection Masters implies in the election process should still reject any connection
as the primary master is still unknown. as the primary master is still unknown.
- Connections must support 2 simultaneous handlers (CODE) - Connections must support 2 simultaneous handlers (CODE)
Connections currently define only one handler, which is enough for Connections currently define only one handler, which is enough for
monothreaded code. But when using multithreaded code, there are 2 monothreaded code. But when using multithreaded code, there are 2
possible handlers involved in a packet reception: possible handlers involved in a packet reception:
- The first one handles notifications only (nothing special to do - The first one handles notifications only (nothing special to do
regarding multithreading) regarding multithreading)
- The second one handles expected messages (such message must be - The second one handles expected messages (such message must be
directed to the right thread) directed to the right thread)
The second handler must be possible to set on the connection when that The second handler must be possible to set on the connection when that
connection is thread-safe (MT version of connection classes). connection is thread-safe (MT version of connection classes).
Also, the code to detect wether a response is expected or not must be Also, the code to detect wether a response is expected or not must be
genericised and moved out of handlers. genericised and moved out of handlers.
- Pack (FEATURE) - Pack (FEATURE)
- Control that client processed all invalidations before starting a - Control that client processed all invalidations before starting a
transaction (CONSISTENCY) transaction (CONSISTENCY)
If a client starts a transaction before it received an invalidation If a client starts a transaction before it received an invalidation
message caused by a transaction commited, it will use outdated data. message caused by a transaction commited, it will use outdated data.
This is a bug known in Zeo. This is a bug known in Zeo.
- Factorise node initialisation for admin, client and storage (CODE) - Factorise node initialisation for admin, client and storage (CODE)
The same code to ask/receive node list and partition table exists in too The same code to ask/receive node list and partition table exists in too
many places. many places.
- Clarify handler methods to call when a connection is accepted from a - Clarify handler methods to call when a connection is accepted from a
listening conenction and when remote node is identified listening conenction and when remote node is identified
(cf. neo/bootstrap.py). (cf. neo/bootstrap.py).
- Choose how to handle a storage integrity verification when it comes back. - Choose how to handle a storage integrity verification when it comes back.
Do the replication process, the verification stage, with or without Do the replication process, the verification stage, with or without
unfinished transactions, cells have to set as outdated, if yes, should the unfinished transactions, cells have to set as outdated, if yes, should the
partition table changes be broadcasted ? (BANDWITH, SPEED) partition table changes be broadcasted ? (BANDWITH, SPEED)
- Review PENDING/HIDDEN/SHUTDOWN states, don't use notifyNodeInformation() - Review PENDING/HIDDEN/SHUTDOWN states, don't use notifyNodeInformation()
to do a state-switch, use a exception-based mechanism ? (CODE) to do a state-switch, use a exception-based mechanism ? (CODE)
- Ensure that registered timeout are canceled if the related connection was - Ensure that registered timeout are canceled if the related connection was
closed. (CODE) closed. (CODE)
...@@ -105,7 +105,7 @@ RC - Review output of pylint (CODE) ...@@ -105,7 +105,7 @@ RC - Review output of pylint (CODE)
dropNode to reduce packet processing complexity and reduce bad actions dropNode to reduce packet processing complexity and reduce bad actions
like set a node in TEMPORARILY_DOWN state. like set a node in TEMPORARILY_DOWN state.
- Consider process writable events in event.poll() method to ensure that - Consider process writable events in event.poll() method to ensure that
pending outgoing data are sent if the network is ready to avoid wait for pending outgoing data are sent if the network is ready to avoid wait for
an incoming packet that trigger the poll() system call. an incoming packet that trigger the poll() system call.
- Allow daemonize NEO processes, re-use code from TIDStorage and support - Allow daemonize NEO processes, re-use code from TIDStorage and support
start/stop/restart/status commands. start/stop/restart/status commands.
...@@ -113,28 +113,28 @@ RC - Review output of pylint (CODE) ...@@ -113,28 +113,28 @@ RC - Review output of pylint (CODE)
Storage Storage
- Implement incremental storage verification (BANDWITH) - Implement incremental storage verification (BANDWITH)
When a partition cell is in out-of-date state, the entire transition When a partition cell is in out-of-date state, the entire transition
history is checked. history is checked.
This is because there might be gaps in cell tid history, as an out-of-date This is because there might be gaps in cell tid history, as an out-of-date
node is writable (although non-readable). node is writable (although non-readable).
It should use an incremental mechanism to only check transaction past a It should use an incremental mechanism to only check transaction past a
certain TID known to have no gap. certain TID known to have no gap.
- Use embeded MySQL database instead of a stand-alone MySQL server. - Use embeded MySQL database instead of a stand-alone MySQL server.
(LATENCY)(to be discussed) (LATENCY)(to be discussed)
- Make replication work even in non-operational cluster state - Make replication work even in non-operational cluster state
(HIGH AVAILABILITY) (HIGH AVAILABILITY)
When a master decided a partition change triggering replication, When a master decided a partition change triggering replication,
replication should happen independently of cluster state. (Maybe we still replication should happen independently of cluster state. (Maybe we still
need a primary master, to void replicating from an outdated partition need a primary master, to void replicating from an outdated partition
table setup.) table setup.)
- Flush asynchronously objects from partition cells not served (DISK SPACE) - Flush asynchronously objects from partition cells not served (DISK SPACE)
- Close connections to other storage nodes (SYSTEM RESOURCE USAGE) - Close connections to other storage nodes (SYSTEM RESOURCE USAGE)
When a replication finishes, the connection is not closed currently. It When a replication finishes, the connection is not closed currently. It
should be closed (possibly asynchronously, and possibly by detecting that should be closed (possibly asynchronously, and possibly by detecting that
connection is idle - similar to keep-alive principle) connection is idle - similar to keep-alive principle)
- Notify master when storage becomes available for clients (LATENCY) - Notify master when storage becomes available for clients (LATENCY)
Currently, storage presence is broadcasted to client nodes too early, as Currently, storage presence is broadcasted to client nodes too early, as
the storage node would refuse them until it has only up-to-date data (not the storage node would refuse them until it has only up-to-date data (not
only up-to-date cells, but also a partition table and node states). only up-to-date cells, but also a partition table and node states).
- Create a specialized PartitionTable that know the database and replicator - Create a specialized PartitionTable that know the database and replicator
to remove duplicates and remove logic from handlers (CODE) to remove duplicates and remove logic from handlers (CODE)
...@@ -164,26 +164,26 @@ RC - Review output of pylint (CODE) ...@@ -164,26 +164,26 @@ RC - Review output of pylint (CODE)
Master Master
- Master node data redundancy (HIGH AVAILABILITY) - Master node data redundancy (HIGH AVAILABILITY)
Secondary master nodes should replicate primary master data (ie, primary Secondary master nodes should replicate primary master data (ie, primary
master should inform them of such changes). master should inform them of such changes).
This data takes too long to extract from storage nodes, and loosing it This data takes too long to extract from storage nodes, and loosing it
increases the risk of starting from underestimated values. increases the risk of starting from underestimated values.
This risk is (currently) unavoidable when all nodes stop running, but this This risk is (currently) unavoidable when all nodes stop running, but this
case must be avoided. case must be avoided.
- Don't reject peers during startup phases (STARTUP LATENCY) - Don't reject peers during startup phases (STARTUP LATENCY)
When (for example) a client sends a RequestNodeIdentification to the When (for example) a client sends a RequestNodeIdentification to the
primary master node while the cluster is not yet operational, the primary primary master node while the cluster is not yet operational, the primary
master should postpone the node acceptance until the cluster is master should postpone the node acceptance until the cluster is
operational, instead of closing the connection immediately. This would operational, instead of closing the connection immediately. This would
avoid the need to poll the master to know when it is ready. avoid the need to poll the master to know when it is ready.
- Differential partition table updates (BANDWITH) - Differential partition table updates (BANDWITH)
When a storage asks for current partition table (when it connects to a When a storage asks for current partition table (when it connects to a
cluster in service state), it must update its knowledge of the partition cluster in service state), it must update its knowledge of the partition
table. Currently it's done by fetching the entire table. If the master table. Currently it's done by fetching the entire table. If the master
keeps a history of a few last changes to partition table, it would be able keeps a history of a few last changes to partition table, it would be able
to only send a differential update (via the incremental update mechanism) to only send a differential update (via the incremental update mechanism)
- During recovery phase, store multiple partition tables (ADMINISTATION) - During recovery phase, store multiple partition tables (ADMINISTATION)
When storage nodes know different version of the partition table, the When storage nodes know different version of the partition table, the
master should be abdle to present them to admin to allow him to choose one master should be abdle to present them to admin to allow him to choose one
when moving on to next phase. when moving on to next phase.
- Optimize operational status check by recording which rows are ready - Optimize operational status check by recording which rows are ready
...@@ -192,16 +192,16 @@ RC - Review output of pylint (CODE) ...@@ -192,16 +192,16 @@ RC - Review output of pylint (CODE)
frequently and rarely used nodes (SCALABILITY) frequently and rarely used nodes (SCALABILITY)
Client Client
- Client should prefer storage nodes it's already connected to when - Client should prefer storage nodes it's already connected to when
retrieving objects (LOAD LATENCY) retrieving objects (LOAD LATENCY)
- Implement C version of mq.py (LOAD LATENCY) - Implement C version of mq.py (LOAD LATENCY)
- Move object data replication task to storage nodes (COMMIT LATENCY) - Move object data replication task to storage nodes (COMMIT LATENCY)
Currently the client node must send a single object data to all storage Currently the client node must send a single object data to all storage
nodes in charge of the partition cell containing that object. This nodes in charge of the partition cell containing that object. This
increases the time the client has to wait for storage reponse, and increases the time the client has to wait for storage reponse, and
increases client-to-storage bandwith usage. It must be possible to send increases client-to-storage bandwith usage. It must be possible to send
object data to only one stroage and that storage should automatically object data to only one stroage and that storage should automatically
replicate on other storages. Locks on objects would then be released by replicate on other storages. Locks on objects would then be released by
storage nodes. storage nodes.
- Use generic bootstrap module (CODE) - Use generic bootstrap module (CODE)
- Find a way to make ask() from the thread poll to allow send initial packet - Find a way to make ask() from the thread poll to allow send initial packet
...@@ -215,14 +215,14 @@ RC - Review output of pylint (CODE) ...@@ -215,14 +215,14 @@ RC - Review output of pylint (CODE)
imports. imports.
Later Later
- Consider auto-generating cluster name upon initial startup (it might - Consider auto-generating cluster name upon initial startup (it might
actualy be a partition property). actualy be a partition property).
- Consider ways to centralise the configuration file, or make the - Consider ways to centralise the configuration file, or make the
configuration updatable automaticaly on all nodes. configuration updatable automaticaly on all nodes.
- Consider storing some metadata on master nodes (partition table [version], - Consider storing some metadata on master nodes (partition table [version],
...). This data should be treated non-authoritatively, as a way to lower ...). This data should be treated non-authoritatively, as a way to lower
the probability to use an outdated partition table. the probability to use an outdated partition table.
- Decentralize primary master tasks as much as possible (consider - Decentralize primary master tasks as much as possible (consider
distributed lock mechanisms, ...) distributed lock mechanisms, ...)
- Make admin node able to monitor multiple clusters simultaneously - Make admin node able to monitor multiple clusters simultaneously
- Choose how to compute the storage size - Choose how to compute the storage size
...@@ -231,8 +231,8 @@ RC - Review output of pylint (CODE) ...@@ -231,8 +231,8 @@ RC - Review output of pylint (CODE)
- When importing data, objects with non-allocated OIDs are stored. The - When importing data, objects with non-allocated OIDs are stored. The
storage can detect this and could notify the master to not allocated lower storage can detect this and could notify the master to not allocated lower
OIDs. But during import, each object stored trigger this notification and OIDs. But during import, each object stored trigger this notification and
may cause a big network overhead. It would be better to refuse any client may cause a big network overhead. It would be better to refuse any client
connection and thus no OID allocation during import. It may be interesting connection and thus no OID allocation during import. It may be interesting
to create a new stage for the cluster startup... to be discussed. to create a new stage for the cluster startup... to be discussed.
- Simple deployment solution, based on embedded database, integrated master - Simple deployment solution, based on embedded database, integrated master
and storage node that works out of the box and storage node that works out of the box
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment