Commit ed7ee13b authored by Kirill Smelkov's avatar Kirill Smelkov

Merge tag 'v1.10'

NEO 1.10

* tag 'v1.10': (55 commits)
  Release version 1.10
  Maximize resiliency by taking into account the topology of storage nodes
  storage: also commit updated cell TID at each replicated chunk of 'obj' records
  storage: skip useless work when unlocking transactions
  qa: flush logs at the end of each test when -L is not used
  qa: add a log in case that a mysterious bug happens again
  storage: clarify log about data deletion of discarded cells
  debug: new example to run the profiler for 1 minute
  mysql: fix replication of big oids (> 16M)
  tests/cluster: speedup waiting a bit
  protocol: update packet docstrings
  Bump protocol version
  protocol: a single byte is more than enough to encode enums
  protocol: small cleanup in packet registration
  Optimize resumption of replication by starting from a greater TID
  importer: update comment about a workaround for ZODB3
  Micro-optimization of p64/u64
  qa: add a log in testBackupNodeLost for easier debugging
  Document that the bug when checking replicas may also cause the master to crash
  storage: stop logging 'Abort TXN' for txn that have been locked
  storage: split _migrate2() for reusable _alterTable()
  qa: new testStorageUpgrade
  qa: update testStorageUpgrade data for what is not automatically upgraded
  qa: original data for the future testStorageUpgrade
  sqlite: fix indexes of upgraded db
  importer: fix NameError when recovering during tpc_finish
  fixup! importer: fetch and process the data to import in a separate process
  Serialize empty transaction extension with an empty string
  client: fix partial import from a source storage
  qa: give a title to subprocesses of functional tests
  importer: give a title to the 'import' and 'writeback' subprocesses
  importer: fetch and process the data to import in a separate process
  importer: new option to write back new transactions to the source database
  importer: log when the transaction index for FileStorage DB is built
  importer: open imported zodb in read-only whenever possible
  fixup! mysql: fix remaining places where a server disconnection was not catched
  fixup! storage: speed up replication by sending bigger network packets
  mysql: do not full-scan for duplicates of big oids if deduplication is disabled
  mysql: fix remaining places where a server disconnection was not catched
  fixup! Add support for custom compression levels
  importer: reenable compression by default
  qa: review testImporter
  qa: remove a few uses of 'chr'
  Fix a few issues with ZODB5
  importer: small code cleanup in speedupFileStorageTxnLookup patch
  importer: do not trigger speedupFileStorageTxnLookup uselessly
  Add support for custom compression levels
  setup: update MANIFEST.in
  importer: do not checksum data twice
  client: store uncompressed if compressed size is equal
  fixup! master: automatically discard feeding cells that get out-of-date
  master: automatically discard feeding cells that get out-of-date
  qa: remove useless indentation in testSafeTweak
  bench: new option to mesure ZEO perfs in matrix test
  bench: reduce number of partitions in matrix test
  storage: fix replication of creation undone
parents 27df6fe8 1ef5c1ba
......@@ -16,6 +16,19 @@ This happens in the following conditions:
4. the cell is checked completely before it could replicate up to the max tid
to check
Sometimes, it causes the master to crash::
File "neo/lib/handler.py", line 72, in dispatch
method(conn, *args, **kw)
File "neo/master/handlers/storage.py", line 93, in notifyReplicationDone
cell_list = app.backup_app.notifyReplicationDone(node, offset, tid)
File "neo/master/backup_app.py", line 337, in notifyReplicationDone
assert cell.isReadable()
AssertionError
Workaround: make sure all cells are up-to-date before checking replicas.
Found by running testBackupNodeLost many times.
Found by running testBackupNodeLost many times:
- either a failureException: 12 != 11
- or the above assert failure, in which case the unit test freezes
Change History
==============
1.10 (2018-07-16)
-----------------
A important performance improvement is that the replication now remembers where
it was interrupted: a storage node that gets disconnected for a short time now
gets fully operational quite instantaneously because it only has to replicate
the new data. Before, the time to recover depended on the size of the DB, just
to verify that most of the data are already transferred.
As a small optimization, an empty transaction extension is now serialized with
an empty string.
The above 2 changes required a bump of the protocol version, as well as an
upgrade of the storage format. Once upgraded (this is done automatically as
usual), databases can't be opened anymore by older versions of NEO.
Other general changes:
- Add support for custom compression levels.
- Maximize resiliency by taking into account the topology of storage nodes.
- Fix a few issues with ZODB5. Note however that merging several DB with the
Importer backend only works if they were only used with ZODB < 5.
Master:
- Automatically discard feeding cells that get out-of-date.
Client:
- Fix partial import from a source storage.
- Store uncompressed if compressed size is equal.
Storage:
- Fixed v1.9 code that sped up the replication by sending bigger network
packets.
- Fix replication of creation undone.
- Stop logging 'Abort TXN' for txn that have been locked.
- Clarify log about data deletion of discarded cells.
MySQL backend:
- Fix replication of big OIDs (> 16M).
- Do not full-scan for duplicates of big OIDs if deduplication is disabled.
- Fix remaining places where a server disconnection was not catched.
SQlite backend:
- Fix indexes of upgraded databases.
Importer backend:
- Fetch and process the data to import in a separate process. It is even
usually free to use the best compression level.
- New option to write back new transactions to the source database.
See 'importer.conf' for more information.
- Give a title to the 'import' and 'writeback' subprocesses,
if the 'setproctitle' egg is installed.
- Log when the transaction index for FileStorage DB is built.
- Open imported database in read-only whenever possible.
- Do not trigger speedupFileStorageTxnLookup uselessly.
- Do not checksum data twice.
- Fix NameError when recovering during tpc_finish.
1.9 (2018-03-13)
----------------
......
graft tools
include neo.conf CHANGELOG.rst TODO TESTS.txt ZODB3.patch
include neo.conf CHANGELOG.rst TODO ZODB3.patch
......@@ -45,6 +45,12 @@
# (instead of adapter=Importer & database=/path_to_this_file).
adapter=MySQL
database=neo
# Keep writing back new transactions to the source database, provided it is
# not splitted. In case of any issue, the import can be aborted without losing
# data. Note however that it is asynchronous so don't stop the storage node
# too quickly after the last committed transaction (e.g. check with tools like
# fstail).
writeback=true
# The other sections are for source databases.
[root]
......@@ -52,7 +58,8 @@ database=neo
# ZEO is possible but less efficient: ZEO servers must be stopped
# if NEO opens FileStorage DBs directly.
# Note that NEO uses 'new_oid' method to get the last OID, that's why the
# source DB can't be open read-only. NEO never modifies a FileStorage DB.
# source DB can't be open read-only. Unless 'writeback' is enabled, NEO never
# modifies a FileStorage DB.
storage=
<filestorage>
path /path/to/root.fs
......
......@@ -160,11 +160,7 @@ class Storage(BaseStorage.BaseStorage,
def copyTransactionsFrom(self, source, verbose=False):
""" Zope compliant API """
return self.importFrom(source)
def importFrom(self, source, start=None, stop=None, preindex=None):
""" Allow import only a part of the source storage """
return self.app.importFrom(self, source, start, stop, preindex)
return self.app.importFrom(self, source)
def pack(self, t, referencesf, gc=False):
if gc:
......
......@@ -14,11 +14,14 @@
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
from cPickle import dumps, loads
from zlib import compress, decompress
import heapq
import time
try:
from ZODB._compat import dumps, loads, _protocol
except ImportError:
from cPickle import dumps, loads
_protocol = 1
from ZODB.POSException import UndoError, ConflictError, ReadConflictError
from . import OLD_ZODB
if OLD_ZODB:
......@@ -26,6 +29,7 @@ if OLD_ZODB:
from persistent.TimeStamp import TimeStamp
from neo.lib import logging
from neo.lib.compress import decompress_list, getCompress
from neo.lib.protocol import NodeTypes, Packets, \
INVALID_PARTITION, MAX_TID, ZERO_HASH, ZERO_TID
from neo.lib.util import makeChecksum, dump
......@@ -50,7 +54,6 @@ if SignalHandler:
import signal
SignalHandler.registerHandler(signal.SIGUSR2, logging.reopen)
class Application(ThreadedApplication):
"""The client node application."""
......@@ -99,7 +102,7 @@ class Application(ThreadedApplication):
# _connecting_to_master_node is used to prevent simultaneous master
# node connection attempts
self._connecting_to_master_node = Lock()
self.compress = compress
self.compress = getCompress(compress)
def __getattr__(self, attr):
if attr in ('last_tid', 'pt'):
......@@ -215,7 +218,7 @@ class Application(ThreadedApplication):
node=node,
dispatcher=self.dispatcher)
p = Packets.RequestIdentification(
NodeTypes.CLIENT, self.uuid, None, self.name, None)
NodeTypes.CLIENT, self.uuid, None, self.name, (), None)
try:
ask(conn, p, handler=handler)
except ConnectionClosed:
......@@ -388,7 +391,7 @@ class Application(ThreadedApplication):
logging.error('wrong checksum from %s for oid %s',
conn, dump(oid))
raise NEOStorageReadRetry(False)
return (decompress(data) if compression else data,
return (decompress_list[compression](data),
tid, next_tid, data_tid)
raise NEOStorageCreationUndoneError(dump(oid))
return self._askStorageForRead(oid,
......@@ -435,17 +438,7 @@ class Application(ThreadedApplication):
checksum = ZERO_HASH
else:
assert data_serial is None
size = len(data)
if self.compress:
compressed_data = compress(data)
if size < len(compressed_data):
compressed_data = data
compression = 0
else:
compression = 1
else:
compression = 0
compressed_data = data
size, compression, compressed_data = self.compress(data)
checksum = makeChecksum(compressed_data)
txn_context.data_size += size
# Store object in tmp cache
......@@ -554,9 +547,12 @@ class Application(ThreadedApplication):
txn_context = self._txn_container.get(transaction)
self.waitStoreResponses(txn_context)
ttid = txn_context.ttid
ext = transaction._extension
ext = dumps(ext, _protocol) if ext else ''
# user and description are cast to str in case they're unicode.
# BBB: This is not required anymore with recent ZODB.
packet = Packets.AskStoreTransaction(ttid, str(transaction.user),
str(transaction.description), dumps(transaction._extension),
txn_context.cache_dict)
str(transaction.description), ext, txn_context.cache_dict)
queue = txn_context.queue
involved_nodes = txn_context.involved_nodes
# Ask in parallel all involved storage nodes to commit object metadata.
......@@ -786,10 +782,6 @@ class Application(ThreadedApplication):
self.waitStoreResponses(txn_context)
return None, txn_oid_list
def _insertMetadata(self, txn_info, extension):
for k, v in loads(extension).items():
txn_info[k] = v
def _getTransactionInformation(self, tid):
return self._askStorageForRead(tid,
Packets.AskTransactionInformation(tid))
......@@ -829,7 +821,8 @@ class Application(ThreadedApplication):
if filter is None or filter(txn_info):
txn_info.pop('packed')
txn_info.pop("oids")
self._insertMetadata(txn_info, txn_ext)
if txn_ext:
txn_info.update(loads(txn_ext))
append(txn_info)
if len(undo_info) >= last - first:
break
......@@ -857,7 +850,7 @@ class Application(ThreadedApplication):
tid = None
for tid in tid_list:
(txn_info, txn_ext) = self._getTransactionInformation(tid)
txn_info['ext'] = loads(txn_ext)
txn_info['ext'] = loads(txn_ext) if txn_ext else {}
append(txn_info)
return (tid, txn_list)
......@@ -876,23 +869,29 @@ class Application(ThreadedApplication):
txn_info['size'] = size
if filter is None or filter(txn_info):
result.append(txn_info)
self._insertMetadata(txn_info, txn_ext)
if txn_ext:
txn_info.update(loads(txn_ext))
return result
def importFrom(self, storage, source, start, stop, preindex=None):
def importFrom(self, storage, source):
# TODO: The main difference with BaseStorage implementation is that
# preindex can't be filled with the result 'store' (tid only
# known after 'tpc_finish'. This method could be dropped if we
# implemented IStorageRestoreable (a wrapper around source would
# still be required for partial import).
if preindex is None:
preindex = {}
for transaction in source.iterator(start, stop):
for transaction in source.iterator():
tid = transaction.tid
self.tpc_begin(storage, transaction, tid, transaction.status)
for r in transaction:
oid = r.oid
pre = preindex.get(oid)
try:
pre = preindex[oid]
except KeyError:
try:
pre = self.load(oid)[1]
except NEOStorageNotFoundError:
pre = ZERO_TID
self.store(oid, pre, r.data, r.version, transaction)
preindex[oid] = tid
conflicted = self.tpc_vote(transaction)
......
......@@ -14,10 +14,14 @@
Give the name of the cluster
</description>
</key>
<key name="compress" datatype="boolean">
<key name="compress" datatype=".compress">
<description>
If true, data is automatically compressed (unless compressed size is
not smaller). This is the default behaviour.
The value is either of 'boolean' type or an explicit algorithm that
matches the regex 'zlib(=\d+)?', where the optional number is
the compression level.
Any record that is not smaller once compressed is stored uncompressed.
True is the default and its meaning may change over time:
currently, it is the same as 'zlib'.
</description>
</key>
<key name="read-only" datatype="boolean">
......
......@@ -23,3 +23,11 @@ class NeoStorage(BaseConfig):
config = self.config
return Storage(**{k: getattr(config, k)
for k in config.getSectionAttributes()})
def compress(value):
from ZConfig.datatypes import asBoolean
try:
return asBoolean(value)
except ValueError:
from neo.lib.compress import parseOption
return parseOption(value)
......@@ -14,10 +14,10 @@
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
from zlib import decompress
from ZODB.TimeStamp import TimeStamp
from neo.lib import logging
from neo.lib.compress import decompress_list
from neo.lib.protocol import Packets, uuid_str
from neo.lib.util import dump, makeChecksum
from neo.lib.exception import NodeNotReady
......@@ -129,8 +129,7 @@ class StorageAnswersHandler(AnswerBaseHandler):
'wrong checksum while getting back data for'
' object %s during rebase of transaction %s'
% (dump(oid), dump(txn_context.ttid)))
if compression:
data = decompress(data)
data = decompress_list[compression](data)
size = len(data)
txn_context.data_size += size
if cached:
......
......@@ -47,7 +47,7 @@ class ConnectionPool(object):
conn = MTClientConnection(app, app.storage_event_handler, node,
dispatcher=app.dispatcher)
p = Packets.RequestIdentification(NodeTypes.CLIENT,
app.uuid, None, app.name, app.id_timestamp)
app.uuid, None, app.name, (), app.id_timestamp)
try:
app._ask(conn, p, handler=app.storage_bootstrap_handler)
except ConnectionClosed:
......
......@@ -164,3 +164,17 @@ elif IF == 'frames':
write("Thread %s:\n" % thread_id)
traceback.print_stack(frame)
write("End of dump\n")
elif IF == 'profile':
DURATION = 60
def stop(prof, path):
prof.disable()
prof.dump_stats(path)
@defer
def profile(app):
import cProfile, threading, time
from .lib.protocol import uuid_str
path = 'neo-%s-%s.prof' % (uuid_str(app.uuid), time.time())
prof = cProfile.Profile()
threading.Timer(DURATION, stop, (prof, path)).start()
prof.enable()
......@@ -26,13 +26,14 @@ class BootstrapManager(EventHandler):
Manage the bootstrap stage, lookup for the primary master then connect to it
"""
def __init__(self, app, node_type, server=None):
def __init__(self, app, node_type, server=None, devpath=()):
"""
Manage the bootstrap stage of a non-master node, it lookup for the
primary master node, connect to it then returns when the master node
is ready.
"""
self.server = server
self.devpath = devpath
self.node_type = node_type
self.num_replicas = None
self.num_partitions = None
......@@ -43,7 +44,7 @@ class BootstrapManager(EventHandler):
def connectionCompleted(self, conn):
EventHandler.connectionCompleted(self, conn)
conn.ask(Packets.RequestIdentification(self.node_type, self.uuid,
self.server, self.app.name, None))
self.server, self.app.name, self.devpath, None))
def connectionFailed(self, conn):
EventHandler.connectionFailed(self, conn)
......
#
# Copyright (C) 2018 Nexedi SA
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
import zlib
decompress_list = (
lambda data: data,
zlib.decompress,
)
def parseOption(value):
x = value.split('=', 1)
try:
alg = ('zlib',).index(x[0])
if len(x) == 1:
return alg, None
level = int(x[1])
except Exception:
raise ValueError("not a valid 'compress' option: %r" % value)
if 0 < level <= zlib.Z_BEST_COMPRESSION:
return alg, level
raise ValueError("invalid compression level: %r" % level)
def getCompress(value):
if value:
alg, level = (0, None) if value is True else value
_compress = zlib.compress
if level:
zlib_compress = _compress
_compress = lambda data: zlib_compress(data, level)
alg += 1
assert 0 < alg < len(decompress_list), 'invalid compression algorithm'
def compress(data):
size = len(data)
compressed = _compress(data)
if len(compressed) < size:
return size, alg, compressed
return size, 0, data
compress._compress = _compress # for testBasicStore
return compress
return lambda data: (len(data), 0, data)
......@@ -34,6 +34,7 @@ class SocketConnector(object):
is_closed = is_server = None
connect_limit = {}
CONNECT_LIMIT = 1 # XXX actually this is (RE-)CONNECT_THROTTLE
SOMAXCONN = 5 # for threaded tests
def __new__(cls, addr, s=None):
if s is None:
......@@ -78,6 +79,7 @@ class SocketConnector(object):
def queue(self, data):
was_empty = not self.queued
self.queued += data
for data in data:
self.queue_size += len(data)
return was_empty
......@@ -123,7 +125,7 @@ class SocketConnector(object):
try:
self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self._bind(self.addr)
self.socket.listen(5)
self.socket.listen(self.SOMAXCONN)
except socket.error, e:
self.socket.close()
self._error('listen', e)
......
......@@ -26,9 +26,6 @@ class PrimaryFailure(NeoException):
class StoppedOperation(NeoException):
pass
class DatabaseFailure(NeoException):
pass
class NodeNotReady(NeoException):
pass
......@@ -22,14 +22,13 @@ def check_signature(reference, function):
a, b, c, d = inspect.getargspec(function)
x = len(A) - len(a)
if x < 0: # ignore extra default parameters
if x + len(d) < 0:
if B or x + len(d) < 0:
return False
del a[x:]
d = d[:x] or None
elif x: # different signature
# We have no need yet to support methods with default parameters.
return a == A[:-x] and (b or a and c) and not (d or D)
return a == A and b == B and c == C and d == D
return a == A[:-x] and (b or a and c) and (d or ()) == (D or ())[:-x]
return a == A and (b or not B) and (c or not C) and d == D
def implements(obj, ignore=()):
ignore = set(ignore)
......@@ -55,7 +54,7 @@ def implements(obj, ignore=()):
while 1:
name, func = base.pop()
x = getattr(obj, name)
if x.im_class is tobj:
if type(getattr(x, '__self__', None)) is tobj:
x = x.__func__
if x is func:
try:
......
......@@ -290,3 +290,16 @@ class NEOLogger(Logger):
logging = NEOLogger()
signal.signal(signal.SIGRTMIN, lambda signum, frame: logging.flush())
signal.signal(signal.SIGRTMIN+1, lambda signum, frame: logging.reopen())
def patch():
def fork():
with logging:
pid = os_fork()
if not pid:
logging._setup()
return pid
os_fork = os.fork
os.fork = fork
patch()
del patch
......@@ -28,6 +28,7 @@ class Node(object):
_connection = None
_identified = False
devpath = ()
id_timestamp = None
def __init__(self, manager, address=None, uuid=None, state=NodeStates.DOWN):
......
......@@ -25,6 +25,7 @@ def speedupFileStorageTxnLookup():
from array import array
from bisect import bisect
from collections import defaultdict
from neo.lib import logging
from ZODB.FileStorage.FileStorage import FileStorage, FileIterator
typecode = 'L' if array('I').itemsize < 4 else 'I'
......@@ -44,6 +45,8 @@ def speedupFileStorageTxnLookup():
try:
index = self._tidindex
except AttributeError:
logging.info("Building index for faster lookup of"
" transactions in the FileStorage DB.")
# Cache a sorted list of all the file pos from oid index.
# To reduce memory usage, the list is splitted in arrays of
# low order 32-bit words.
......@@ -52,10 +55,10 @@ def speedupFileStorageTxnLookup():
tindex[x >> 32].append(x & 0xffffffff)
index = self._tidindex = []
for h, l in sorted(tindex.iteritems()):
x = array('I')
x.fromlist(sorted(l))
l = self._read_data_header(h << 32 | x[0])
index.append((l.tid, h, x))
l = array(typecode, sorted(l))
x = self._read_data_header(h << 32 | l[0])
index.append((x.tid, h, l))
logging.info("... index built")
x = bisect(index, (start,)) - 1
if x >= 0:
x, h, index = index[x]
......
......@@ -22,7 +22,7 @@ from struct import Struct
# The protocol version must be increased whenever upgrading a node may require
# to upgrade other nodes. It is encoded as a 4-bytes big-endian integer and
# the high order byte 0 is different from TLS Handshake (0x16).
PROTOCOL_VERSION = 1
PROTOCOL_VERSION = 4
ENCODED_VERSION = Struct('!L').pack(PROTOCOL_VERSION)
# Avoid memory errors on corrupted data.
......@@ -122,22 +122,22 @@ def NodeStates():
@Enum
def CellStates():
# Normal state: cell is writable/readable, and it isn't planned to drop it.
UP_TO_DATE
# Write-only cell. Last transactions are missing because storage is/was down
# for a while, or because it is new for the partition. It usually becomes
# UP_TO_DATE when replication is done.
OUT_OF_DATE
# Normal state: cell is writable/readable, and it isn't planned to drop it.
UP_TO_DATE
# Same as UP_TO_DATE, except that it will be discarded as soon as another
# node finishes to replicate it. It means a partition is moved from 1 node
# to another.
# to another. It is also discarded immediately if out-of-date.
FEEDING
# Not really a state: only used in network packets to tell storages to drop
# partitions.
DISCARDED
# A check revealed that data differs from other replicas. Cell is neither
# readable nor writable.
CORRUPTED
# Not really a state: only used in network packets to tell storages to drop
# partitions.
DISCARDED
# used for logging
node_state_prefix_dict = {
......@@ -463,7 +463,7 @@ class PEnum(PStructItem):
"""
Encapsulate an enumeration value
"""
_fmt = '!l'
_fmt = 'b'
def __init__(self, name, enum):
PStructItem.__init__(self, name)
......@@ -648,7 +648,9 @@ class Error(Packet):
"""
Error is a special type of message, because this can be sent against
any other message, even if such a message does not expect a reply
usually. Any -> Any.
usually.
:nodes: * -> *
"""
_fmt = PStruct('error',
PNumber('code'),
......@@ -657,19 +659,25 @@ class Error(Packet):
class Ping(Packet):
"""
Check if a peer is still alive. Any -> Any.
Empty request used as network barrier.
:nodes: * -> *
"""
_answer = PFEmpty
class CloseClient(Packet):
"""
Tell peer it can close the connection if it has finished with us. Any -> Any
Tell peer that it can close the connection if it has finished with us.
:nodes: * -> *
"""
class RequestIdentification(Packet):
"""
Request a node identification. This must be the first packet for any
connection. Any -> Any.
connection.
:nodes: * -> *
"""
poll_thread = True
......@@ -678,6 +686,7 @@ class RequestIdentification(Packet):
PUUID('uuid'),
PAddress('address'),
PString('name'),
PList('devpath', PString('devid')),
PFloat('id_timestamp'),
)
......@@ -691,7 +700,9 @@ class RequestIdentification(Packet):
class PrimaryMaster(Packet):
"""
Ask current primary master's uuid. CTL -> A.
Ask node identier of the current primary master.
:nodes: ctl -> A
"""
_answer = PStruct('answer_primary',
PUUID('primary_uuid'),
......@@ -699,7 +710,10 @@ class PrimaryMaster(Packet):
class NotPrimaryMaster(Packet):
"""
Send list of known master nodes. SM -> Any.
Notify peer that I'm not the primary master. Attach any extra information
to help the peer joining the cluster.
:nodes: SM -> *
"""
_fmt = PStruct('not_primary_master',
PSignedNull('primary'),
......@@ -710,7 +724,10 @@ class NotPrimaryMaster(Packet):
class Recovery(Packet):
"""
Ask all data needed by master to recover. PM -> S, S -> PM.
Ask storage nodes data needed by master to recover.
Reused by `neoctl print ids`.
:nodes: M -> S; ctl -> A -> M
"""
_answer = PStruct('answer_recovery',
PPTID('ptid'),
......@@ -721,7 +738,9 @@ class Recovery(Packet):
class LastIDs(Packet):
"""
Ask the last OID/TID so that a master can initialize its TransactionManager.
PM -> S, S -> PM.
Reused by `neoctl print ids`.
:nodes: M -> S; ctl -> A -> M
"""
_answer = PStruct('answer_last_ids',
POID('last_oid'),
......@@ -730,8 +749,10 @@ class LastIDs(Packet):
class PartitionTable(Packet):
"""
Ask the full partition table. PM -> S.
Answer rows in a partition table. S -> PM.
Ask storage node the remaining data needed by master to recover.
This is also how the clients get the full partition table on connection.
:nodes: M -> S; C -> M
"""
_answer = PStruct('answer_partition_table',
PPTID('ptid'),
......@@ -740,7 +761,9 @@ class PartitionTable(Packet):
class NotifyPartitionTable(Packet):
"""
Send rows in a partition table to update other nodes. PM -> S, C.
Send the full partition table to admin/storage nodes on connection.
:nodes: M -> A, S
"""
_fmt = PStruct('send_partition_table',
PPTID('ptid'),
......@@ -749,8 +772,9 @@ class NotifyPartitionTable(Packet):
class PartitionChanges(Packet):
"""
Notify a subset of a partition table. This is used to notify changes.
PM -> S, C.
Notify about changes in the partition table.
:nodes: M -> *
"""
_fmt = PStruct('notify_partition_changes',
PPTID('ptid'),
......@@ -765,8 +789,10 @@ class PartitionChanges(Packet):
class StartOperation(Packet):
"""
Tell a storage nodes to start an operation. Until a storage node receives
this message, it must not serve client nodes. PM -> S.
Tell a storage node to start operation. Before this message, it must only
communicate with the primary master.
:nodes: M -> S
"""
_fmt = PStruct('start_operation',
# XXX: Is this boolean needed ? Maybe this
......@@ -776,14 +802,17 @@ class StartOperation(Packet):
class StopOperation(Packet):
"""
Tell a storage node to stop an operation. Once a storage node receives
this message, it must not serve client nodes. PM -> S.
Notify that the cluster is not operational anymore. Any operation between
nodes must be aborted.
:nodes: M -> S, C
"""
class UnfinishedTransactions(Packet):
"""
Ask unfinished transactions S -> PM.
Answer unfinished transactions PM -> S.
Ask unfinished transactions, which will be replicated when they're finished.
:nodes: S -> M
"""
_fmt = PStruct('ask_unfinished_transactions',
PList('row_list',
......@@ -800,8 +829,10 @@ class UnfinishedTransactions(Packet):
class LockedTransactions(Packet):
"""
Ask locked transactions PM -> S.
Answer locked transactions S -> PM.
Ask locked transactions to replay committed transactions that haven't been
unlocked.
:nodes: M -> S
"""
_answer = PStruct('answer_locked_transactions',
PDict('tid_dict',
......@@ -812,7 +843,10 @@ class LockedTransactions(Packet):
class FinalTID(Packet):
"""
Return final tid if ttid has been committed. * -> S. C -> PM.
Return final tid if ttid has been committed, to recover from certain
failures during tpc_finish.
:nodes: M -> S; C -> M, S
"""
_fmt = PStruct('final_tid',
PTID('ttid'),
......@@ -824,7 +858,9 @@ class FinalTID(Packet):
class ValidateTransaction(Packet):
"""
Commit a transaction. PM -> S.
Do replay a committed transaction that was not unlocked.
:nodes: M -> S
"""
_fmt = PStruct('validate_transaction',
PTID('ttid'),
......@@ -833,8 +869,9 @@ class ValidateTransaction(Packet):
class BeginTransaction(Packet):
"""
Ask to begin a new transaction. C -> PM.
Answer when a transaction begin, give a TID if necessary. PM -> C.
Ask to begin a new transaction. This maps to `tpc_begin`.
:nodes: C -> M
"""
_fmt = PStruct('ask_begin_transaction',
PTID('tid'),
......@@ -846,8 +883,10 @@ class BeginTransaction(Packet):
class FailedVote(Packet):
"""
Report storage nodes for which vote failed. C -> M
Report storage nodes for which vote failed.
True is returned if it's still possible to finish the transaction.
:nodes: C -> M
"""
_fmt = PStruct('failed_vote',
PTID('tid'),
......@@ -858,8 +897,10 @@ class FailedVote(Packet):
class FinishTransaction(Packet):
"""
Finish a transaction. C -> PM.
Answer when a transaction is finished. PM -> C.
Finish a transaction. Return the TID of the committed transaction.
This maps to `tpc_finish`.
:nodes: C -> M
"""
poll_thread = True
......@@ -878,8 +919,9 @@ class FinishTransaction(Packet):
class NotifyTransactionFinished(Packet):
"""
Notify that a transaction blocking a replication is now finished
M -> S
Notify that a transaction blocking a replication is now finished.
:nodes: M -> S
"""
_fmt = PStruct('notify_transaction_finished',
PTID('ttid'),
......@@ -888,8 +930,9 @@ class NotifyTransactionFinished(Packet):
class LockInformation(Packet):
"""
Lock information on a transaction. PM -> S.
Notify information on a transaction locked. S -> PM.
Commit a transaction. The new data is read-locked.
:nodes: M -> S
"""
_fmt = PStruct('ask_lock_informations',
PTID('ttid'),
......@@ -902,7 +945,10 @@ class LockInformation(Packet):
class InvalidateObjects(Packet):
"""
Invalidate objects. PM -> C.
Notify about a new transaction modifying objects,
invalidating client caches.
:nodes: M -> C
"""
_fmt = PStruct('ask_finish_transaction',
PTID('tid'),
......@@ -911,7 +957,10 @@ class InvalidateObjects(Packet):
class UnlockInformation(Packet):
"""
Unlock information on a transaction. PM -> S.
Notify about a successfully committed transaction. The new data can be
unlocked.
:nodes: M -> S
"""
_fmt = PStruct('notify_unlock_information',
PTID('ttid'),
......@@ -919,8 +968,9 @@ class UnlockInformation(Packet):
class GenerateOIDs(Packet):
"""
Ask new object IDs. C -> PM.
Answer new object IDs. PM -> C.
Ask new OIDs to create objects.
:nodes: C -> M
"""
_fmt = PStruct('ask_new_oids',
PNumber('num_oids'),
......@@ -932,8 +982,10 @@ class GenerateOIDs(Packet):
class Deadlock(Packet):
"""
Ask master to generate a new TTID that will be used by the client
to rebase a transaction. S -> PM -> C
Ask master to generate a new TTID that will be used by the client to solve
a deadlock by rebasing the transaction on top of concurrent changes.
:nodes: S -> M -> C
"""
_fmt = PStruct('notify_deadlock',
PTID('ttid'),
......@@ -942,7 +994,9 @@ class Deadlock(Packet):
class RebaseTransaction(Packet):
"""
Rebase transaction. C -> S.
Rebase a transaction to solve a deadlock.
:nodes: C -> S
"""
_fmt = PStruct('ask_rebase_transaction',
PTID('ttid'),
......@@ -955,7 +1009,9 @@ class RebaseTransaction(Packet):
class RebaseObject(Packet):
"""
Rebase object. C -> S.
Rebase an object change to solve a deadlock.
:nodes: C -> S
XXX: It is a request packet to simplify the implementation. For more
efficiency, this should be turned into a notification, and the
......@@ -981,9 +1037,11 @@ class RebaseObject(Packet):
class StoreObject(Packet):
"""
Ask to store an object. Send an OID, an original serial, a current
transaction ID, and data. C -> S.
Ask to create/modify an object. This maps to `store`.
As for IStorage, 'serial' is ZERO_TID for new objects.
:nodes: C -> S
"""
_fmt = PStruct('ask_store_object',
POID('oid'),
......@@ -1001,7 +1059,9 @@ class StoreObject(Packet):
class AbortTransaction(Packet):
"""
Abort a transaction. C -> S and C -> PM -> S.
Abort a transaction. This maps to `tpc_abort`.
:nodes: C -> S; C -> M -> S
"""
_fmt = PStruct('abort_transaction',
PTID('tid'),
......@@ -1010,8 +1070,9 @@ class AbortTransaction(Packet):
class StoreTransaction(Packet):
"""
Ask to store a transaction. C -> S.
Answer if transaction has been stored. S -> C.
Ask to store a transaction. Implies vote.
:nodes: C -> S
"""
_fmt = PStruct('ask_store_transaction',
PTID('tid'),
......@@ -1024,8 +1085,9 @@ class StoreTransaction(Packet):
class VoteTransaction(Packet):
"""
Ask to store a transaction. C -> S.
Answer if transaction has been stored. S -> C.
Ask to vote a transaction.
:nodes: C -> S
"""
_fmt = PStruct('ask_vote_transaction',
PTID('tid'),
......@@ -1034,15 +1096,15 @@ class VoteTransaction(Packet):
class GetObject(Packet):
"""
Ask a stored object by its OID and a serial or a TID if given. If a serial
is specified, the specified revision of an object will be returned. If
a TID is specified, an object right before the TID will be returned. C -> S.
Answer the requested object. S -> C.
Ask a stored object by its OID, optionally at/before a specific tid.
This maps to `load/loadBefore/loadSerial`.
:nodes: C -> S
"""
_fmt = PStruct('ask_object',
POID('oid'),
PTID('serial'),
PTID('tid'),
PTID('at'),
PTID('before'),
)
_answer = PStruct('answer_object',
......@@ -1058,8 +1120,9 @@ class GetObject(Packet):
class TIDList(Packet):
"""
Ask for TIDs between a range of offsets. The order of TIDs is descending,
and the range is [first, last). C -> S.
Answer the requested TIDs. S -> C.
and the range is [first, last). This maps to `undoLog`.
:nodes: C -> S
"""
_fmt = PStruct('ask_tids',
PIndex('first'),
......@@ -1074,8 +1137,9 @@ class TIDList(Packet):
class TIDListFrom(Packet):
"""
Ask for length TIDs starting at min_tid. The order of TIDs is ascending.
C -> S.
Answer the requested TIDs. S -> C
Used by `iterator`.
:nodes: C -> S
"""
_fmt = PStruct('tid_list_from',
PTID('min_tid'),
......@@ -1090,8 +1154,9 @@ class TIDListFrom(Packet):
class TransactionInformation(Packet):
"""
Ask information about a transaction. Any -> S.
Answer information (user, description) about a transaction. S -> Any.
Ask for transaction metadata.
:nodes: C -> S
"""
_fmt = PStruct('ask_transaction_information',
PTID('tid'),
......@@ -1109,8 +1174,9 @@ class TransactionInformation(Packet):
class ObjectHistory(Packet):
"""
Ask history information for a given object. The order of serials is
descending, and the range is [first, last]. C -> S.
Answer history information (serial, size) for an object. S -> C.
descending, and the range is [first, last]. This maps to `history`.
:nodes: C -> S
"""
_fmt = PStruct('ask_object_history',
POID('oid'),
......@@ -1125,9 +1191,9 @@ class ObjectHistory(Packet):
class PartitionList(Packet):
"""
All the following messages are for neoctl to admin node
Ask information about partition
Answer information about partition
Ask information about partitions.
:nodes: ctl -> A
"""
_fmt = PStruct('ask_partition_list',
PNumber('min_offset'),
......@@ -1142,8 +1208,9 @@ class PartitionList(Packet):
class NodeList(Packet):
"""
Ask information about nodes
Answer information about nodes
Ask information about nodes.
:nodes: ctl -> A
"""
_fmt = PStruct('ask_node_list',
PFNodeType,
......@@ -1155,7 +1222,9 @@ class NodeList(Packet):
class SetNodeState(Packet):
"""
Set the node state
Change the state of a node.
:nodes: ctl -> A -> M
"""
_fmt = PStruct('set_node_state',
PUUID('uuid'),
......@@ -1166,7 +1235,10 @@ class SetNodeState(Packet):
class AddPendingNodes(Packet):
"""
Ask the primary to include some pending node in the partition table
Mark given pending nodes as running, for future inclusion when tweaking
the partition table.
:nodes: ctl -> A -> M
"""
_fmt = PStruct('add_pending_nodes',
PFUUIDList,
......@@ -1176,7 +1248,10 @@ class AddPendingNodes(Packet):
class TweakPartitionTable(Packet):
"""
Ask the primary to optimize the partition table. A -> PM.
Ask the master to balance the partition table, optionally excluding
specific nodes in anticipation of removing them.
:nodes: ctl -> A -> M
"""
_fmt = PStruct('tweak_partition_table',
PFUUIDList,
......@@ -1186,7 +1261,9 @@ class TweakPartitionTable(Packet):
class NotifyNodeInformation(Packet):
"""
Notify information about one or more nodes. PM -> Any.
Notify information about one or more nodes.
:nodes: M -> *
"""
_fmt = PStruct('notify_node_informations',
PFloat('id_timestamp'),
......@@ -1195,7 +1272,9 @@ class NotifyNodeInformation(Packet):
class SetClusterState(Packet):
"""
Set the cluster state
Set the cluster state.
:nodes: ctl -> A -> M
"""
_fmt = PStruct('set_cluster_state',
PEnum('state', ClusterStates),
......@@ -1205,7 +1284,9 @@ class SetClusterState(Packet):
class Repair(Packet):
"""
Ask storage nodes to repair their databases. ctl -> A -> M
Ask storage nodes to repair their databases.
:nodes: ctl -> A -> M
"""
_flags = map(PBoolean, ('dry_run',
# 'prune_orphan' (commented because it's the only option for the moment)
......@@ -1218,13 +1299,18 @@ class Repair(Packet):
class RepairOne(Packet):
"""
See Repair. M -> S
Repair is translated to this message, asking a specific storage node to
repair its database.
:nodes: M -> S
"""
_fmt = PStruct('repair', *Repair._flags)
class ClusterInformation(Packet):
"""
Notify information about the cluster
Notify about a cluster state change.
:nodes: M -> *
"""
_fmt = PStruct('notify_cluster_information',
PEnum('state', ClusterStates),
......@@ -1232,8 +1318,9 @@ class ClusterInformation(Packet):
class ClusterState(Packet):
"""
Ask state of the cluster
Answer state of the cluster
Ask the state of the cluster
:nodes: ctl -> A; A -> M
"""
_answer = PStruct('answer_cluster_state',
......@@ -1244,8 +1331,7 @@ class ObjectUndoSerial(Packet):
"""
Ask storage the serial where object data is when undoing given transaction,
for a list of OIDs.
C -> S
Answer serials at which object data is when undoing a given transaction.
object_tid_dict has the following format:
key: oid
value: 3-tuple
......@@ -1255,7 +1341,8 @@ class ObjectUndoSerial(Packet):
Where undone data is (tid at which data is before given undo).
is_current (bool)
If current_serial's data is current on storage.
S -> C
:nodes: C -> S
"""
_fmt = PStruct('ask_undo_transaction',
PTID('tid'),
......@@ -1277,12 +1364,11 @@ class ObjectUndoSerial(Packet):
class CheckCurrentSerial(Packet):
"""
Verifies if given serial is current for object oid in the database, and
take a write lock on it (so that this state is not altered until
transaction ends).
Answer to AskCheckCurrentSerial.
Same structure as AnswerStoreObject, to handle the same way, except there
is nothing to invalidate in any client's cache.
Check if given serial is current for the given oid, and lock it so that
this state is not altered until transaction ends.
This maps to `checkCurrentSerialInTransaction`.
:nodes: C -> S
"""
_fmt = PStruct('ask_check_current_serial',
PTID('tid'),
......@@ -1295,11 +1381,8 @@ class CheckCurrentSerial(Packet):
class Pack(Packet):
"""
Request a pack at given TID.
C -> M
M -> S
Inform that packing it over.
S -> M
M -> C
:nodes: C -> M -> S
"""
_fmt = PStruct('ask_pack',
PTID('tid'),
......@@ -1311,8 +1394,10 @@ class Pack(Packet):
class CheckReplicas(Packet):
"""
ctl -> A
A -> M
Ask the cluster to search for mismatches between replicas, metadata only,
and optionally within a specific range. Reference nodes can be specified.
:nodes: ctl -> A -> M
"""
_fmt = PStruct('check_replicas',
PDict('partition_dict',
......@@ -1326,7 +1411,11 @@ class CheckReplicas(Packet):
class CheckPartition(Packet):
"""
M -> S
Ask a storage node to compare a partition with all other nodes.
Like for CheckReplicas, only metadata are checked, optionally within a
specific range. A reference node can be specified.
:nodes: M -> S
"""
_fmt = PStruct('check_partition',
PNumber('partition'),
......@@ -1343,11 +1432,8 @@ class CheckTIDRange(Packet):
Ask some stats about a range of transactions.
Used to know if there are differences between a replicating node and
reference node.
S -> S
Stats about a range of transactions.
Used to know if there are differences between a replicating node and
reference node.
S -> S
:nodes: S -> S
"""
_fmt = PStruct('ask_check_tid_range',
PNumber('partition'),
......@@ -1367,11 +1453,8 @@ class CheckSerialRange(Packet):
Ask some stats about a range of object history.
Used to know if there are differences between a replicating node and
reference node.
S -> S
Stats about a range of object history.
Used to know if there are differences between a replicating node and
reference node.
S -> S
:nodes: S -> S
"""
_fmt = PStruct('ask_check_serial_range',
PNumber('partition'),
......@@ -1391,7 +1474,9 @@ class CheckSerialRange(Packet):
class PartitionCorrupted(Packet):
"""
S -> M
Notify that mismatches were found while check replicas for a partition.
:nodes: S -> M
"""
_fmt = PStruct('partition_corrupted',
PNumber('partition'),
......@@ -1403,9 +1488,8 @@ class PartitionCorrupted(Packet):
class LastTransaction(Packet):
"""
Ask last committed TID.
C -> M
Answer last committed TID.
M -> C
:nodes: C -> M; ctl -> A -> M
"""
poll_thread = True
......@@ -1415,16 +1499,17 @@ class LastTransaction(Packet):
class NotifyReady(Packet):
"""
Notify that node is ready to serve requests.
S -> M
"""
pass
Notify that we're ready to serve requests.
# replication
:nodes: S -> M
"""
class FetchTransactions(Packet):
"""
S -> S
Ask a storage node to send all transaction data we don't have,
and reply with the list of transactions we should not have.
:nodes: S -> S
"""
_fmt = PStruct('ask_transaction_list',
PNumber('partition'),
......@@ -1441,7 +1526,9 @@ class FetchTransactions(Packet):
class AddTransaction(Packet):
"""
S -> S
Send metadata of a transaction to a node that do not have them.
:nodes: S -> S
"""
nodelay = False
......@@ -1457,7 +1544,10 @@ class AddTransaction(Packet):
class FetchObjects(Packet):
"""
S -> S
Ask a storage node to send object records we don't have,
and reply with the list of records we should not have.
:nodes: S -> S
"""
_fmt = PStruct('ask_object_list',
PNumber('partition'),
......@@ -1482,7 +1572,9 @@ class FetchObjects(Packet):
class AddObject(Packet):
"""
S -> S
Send an object record to a node that do not have it.
:nodes: S -> S
"""
nodelay = False
......@@ -1499,11 +1591,12 @@ class Replicate(Packet):
"""
Notify a storage node to replicate partitions up to given 'tid'
and from given sources.
M -> S
- upstream_name: replicate from an upstream cluster
- address: address of the source storage node, or None if there's no new
data up to 'tid' for the given partition
:nodes: M -> S
"""
_fmt = PStruct('replicate',
PTID('tid'),
......@@ -1518,7 +1611,8 @@ class ReplicationDone(Packet):
"""
Notify the master node that a partition has been successfully replicated
from a storage to another.
S -> M
:nodes: S -> M
"""
_fmt = PStruct('notify_replication_done',
PNumber('offset'),
......@@ -1528,6 +1622,8 @@ class ReplicationDone(Packet):
class Truncate(Packet):
"""
Request DB to be truncated. Also used to leave backup mode.
:nodes: ctl -> A -> M; M -> S
"""
_fmt = PStruct('truncate',
PTID('tid'),
......@@ -1536,16 +1632,16 @@ class Truncate(Packet):
_answer = Error
StaticRegistry = {}
_next_code = 0
def register(request, ignore_when_closed=None):
""" Register a packet in the packet registry """
code = len(StaticRegistry)
global _next_code
code = _next_code
assert code < RESPONSE_MASK
_next_code = code + 1
if request is Error:
code |= RESPONSE_MASK
# register the request
StaticRegistry[code] = request
if request is None:
return # None registered only to skip a code number (for compatibility)
request._code = code
answer = request._answer
if ignore_when_closed is None:
......@@ -1558,32 +1654,28 @@ def register(request, ignore_when_closed=None):
if answer in (Error, None):
return request
# build a class for the answer
answer = type('Answer%s' % (request.__name__, ), (Packet, ), {})
answer = type('Answer' + request.__name__, (Packet, ), {})
answer._fmt = request._answer
answer.poll_thread = request.poll_thread
# compute the answer code
code = code | RESPONSE_MASK
answer._request = request
assert answer._code is None, "Answer of %s is already used" % (request, )
answer._code = code
answer._code = code | RESPONSE_MASK
request._answer = answer
# and register the answer packet
assert code not in StaticRegistry, "Duplicate response packet code"
StaticRegistry[code] = answer
return (request, answer)
return request, answer
class Packets(dict):
"""
Packet registry that checks packet code uniqueness and provides an index
"""
def __metaclass__(name, base, d):
# this builds a "singleton"
cls = type('PacketRegistry', base, d)()
for k, v in d.iteritems():
if isinstance(v, type) and issubclass(v, Packet):
v.handler_method_name = k[0].lower() + k[1:]
# this builds a "singleton"
return type('PacketRegistry', base, d)(StaticRegistry)
cls[v._code] = v
return cls
# notifications
Error = register(
Error)
RequestIdentification, AcceptIdentification = register(
......
......@@ -15,12 +15,12 @@
# along with this program. If not, see <http://www.gnu.org/licenses/>.
import socket
import os, socket
from binascii import a2b_hex, b2a_hex
from datetime import timedelta, datetime
from hashlib import sha1
from Queue import deque
from struct import pack, unpack
from struct import pack, unpack, Struct
from time import gmtime
TID_LOW_OVERFLOW = 2**32
......@@ -102,11 +102,10 @@ def addTID(ptid, offset):
higher = (d.year, d.month, d.day, d.hour, d.minute)
return packTID(higher, lower)
def u64(s):
return unpack('!Q', s)[0]
def p64(n):
return pack('!Q', n)
p64, u64 = (lambda unpack: (
unpack.__self__.pack,
lambda s: unpack(s)[0]
))(Struct('!Q').unpack)
def add64(packed, offset):
"""Add a python number to a 64-bits packed value"""
......@@ -226,3 +225,25 @@ class cached_property(object):
if obj is None: return self
value = obj.__dict__[self.func.__name__] = self.func(obj)
return value
# This module is always imported before multiprocessing is used, and the
# main process does not want to change name when task are run in threads.
spt_pid = os.getpid()
def setproctitle(title):
global spt_pid
pid = os.getpid()
if spt_pid == pid:
return
spt_pid = pid
# Try using https://pypi.org/project/setproctitle/
try:
# On Linux, this is done by clobbering argv, and the main process
# usually has a longer command line than the title of subprocesses.
os.environ['SPT_NOENV'] = '1'
from setproctitle import setproctitle
except ImportError:
return
finally:
del os.environ['SPT_NOENV']
setproctitle(title)
......@@ -24,7 +24,7 @@ from ..app import monotonic_time
class IdentificationHandler(EventHandler):
def requestIdentification(self, conn, node_type, uuid,
address, name, id_timestamp):
address, name, devpath, id_timestamp):
app = self.app
self.checkClusterName(name)
if address == app.server:
......@@ -101,6 +101,8 @@ class IdentificationHandler(EventHandler):
uuid=uuid, address=address)
else:
node.setUUID(uuid)
if devpath:
node.devpath = tuple(devpath)
node.id_timestamp = monotonic_time()
node.setState(state)
conn.setHandler(handler)
......@@ -120,7 +122,7 @@ class IdentificationHandler(EventHandler):
class SecondaryIdentificationHandler(EventHandler):
def requestIdentification(self, conn, node_type, uuid,
address, name, id_timestamp):
address, name, devpath, id_timestamp):
app = self.app
self.checkClusterName(name)
if address == app.server:
......
......@@ -38,7 +38,7 @@ class ElectionHandler(MasterHandler):
super(ElectionHandler, self).connectionCompleted(conn)
app = self.app
conn.ask(Packets.RequestIdentification(NodeTypes.MASTER,
app.uuid, app.server, app.name, app.election))
app.uuid, app.server, app.name, (), app.election))
def connectionFailed(self, conn):
super(ElectionHandler, self).connectionFailed(conn)
......
......@@ -178,7 +178,7 @@ class PartitionTable(neo.lib.pt.PartitionTable):
def tweak(self, drop_list=()):
"""Optimize partition table
This reassigns cells in 3 ways:
This reassigns cells in 4 ways:
- Discard cells of nodes listed in 'drop_list'. For partitions with too
few readable cells, some cells are instead marked as FEEDING. This is
a preliminary step to drop these nodes, otherwise the partition table
......@@ -187,6 +187,8 @@ class PartitionTable(neo.lib.pt.PartitionTable):
- When a transaction creates new objects (oids are roughly allocated
sequentially), we expect better performance by maximizing the number
of involved nodes (i.e. parallelizing writes).
- For maximum resiliency, cells of each partition are assigned as far
as possible from each other, by checking the topology path of nodes.
Examples of optimal partition tables with np=10, nr=1 and 5 nodes:
......@@ -215,6 +217,17 @@ class PartitionTable(neo.lib.pt.PartitionTable):
U. .U U.
.U U. U.
U. U. .U
For the topology, let's consider an example with paths of the form
(room, machine, disk):
- if there are more rooms than the number of replicas, 2 cells of the
same partition must not be assigned in the same room;
- otherwise, topology paths are checked at a deeper depth,
e.g. not on the same machine and distributed evenly
(off by 1) among rooms.
But the topology is expected to be optimal, otherwise it is ignored.
In some cases, we could fall back to a non-optimal topology but
that would cause extra replication if the user wants to fix it.
"""
# Collect some data in a usable form for the rest of the method.
node_list = {node: {} for node in self.count_dict
......@@ -242,6 +255,67 @@ class PartitionTable(neo.lib.pt.PartitionTable):
i += 1
option_dict = Counter(map(tuple, x))
# Initialize variables/functions to optimize the topology.
devpath_max = []
devpaths = [()] * node_count
if repeats > 1:
_devpaths = [x[0].devpath for x in node_list]
max_depth = min(map(len, _devpaths))
depth = 0
while 1:
if depth < max_depth:
depth += 1
x = Counter(x[:depth] for x in _devpaths)
n = len(x)
x = set(x.itervalues())
# TODO: Prove it works. If the code turns out to be:
# - too pessimistic, the topology is ignored when
# resiliency could be maximized;
# - or worse too optimistic, in which case this
# method raises, possibly after a very long time.
if len(x) == 1 or max(x) * repeats <= node_count:
i, x = divmod(repeats, n)
devpath_max.append((i + 1, x) if x else (i, n))
if n < repeats:
continue
devpaths = [x[:depth] for x in _devpaths]
break
logging.warning("Can't maximize resiliency: fix the topology"
" of your storage nodes and make sure they're all running."
" %s storage device failure(s) may be enough to lose all"
" the database." % (repeats - 1))
break
topology = [{} for _ in xrange(self.np)]
def update_topology():
for offset in option:
n = topology[offset]
for i, (j, k) in zip(devpath, devpath_max):
try:
i, x = n[i]
except KeyError:
n[i] = i, x = [0, {}]
if i == j or i + 1 == j and k == sum(
1 for i in n.itervalues() if i[0] == j):
# Too many cells would be assigned at this topology
# node.
return False
n = x
# The topology may be optimal with this option. Apply it.
for offset in option:
n = topology[offset]
for i in devpath:
n = n[i]
n[0] += 1
n = n[1]
return True
def revert_topology():
for offset in option:
n = topology[offset]
for i in devpath:
n = n[i]
n[0] -= 1
n = n[1]
# Strategies to find the "best" permutation of nodes.
def node_options():
# The second part of the key goes with the above cosmetic sort.
......@@ -291,24 +365,27 @@ class PartitionTable(neo.lib.pt.PartitionTable):
new = [] # the solution
stack = [] # data recursion
def options():
return iter(node_options[len(new)][-1])
x = node_options[len(new)]
return devpaths[x[-2]], iter(x[-1])
for node_options in node_options(): # for each strategy
iter_option = options()
devpath, iter_option = options()
while 1:
try:
option = next(iter_option)
except StopIteration: # 1st strategy only
except StopIteration:
if new:
iter_option = stack.pop()
option_dict[new.pop()] += 1
devpath, iter_option = stack.pop()
option = new.pop()
revert_topology()
option_dict[option] += 1
continue
break
if option_dict[option]:
if option_dict[option] and update_topology():
new.append(option)
if len(new) == len(node_list):
if len(new) == node_count:
break
stack.append(iter_option)
iter_option = options()
stack.append((devpath, iter_option))
devpath, iter_option = options()
option_dict[option] -= 1
if new:
break
......@@ -384,13 +461,18 @@ class PartitionTable(neo.lib.pt.PartitionTable):
if cell.isReadable():
if cell.getNode().isRunning():
lost = None
else :
else:
cell_list.append(cell)
for cell in cell_list:
if cell.getNode() is not lost:
cell.setState(CellStates.OUT_OF_DATE)
change_list.append((offset, cell.getUUID(),
CellStates.OUT_OF_DATE))
node = cell.getNode()
if node is not lost:
if cell.isFeeding():
self.removeCell(offset, node)
state = CellStates.DISCARDED
else:
state = CellStates.OUT_OF_DATE
cell.setState(state)
change_list.append((offset, node.getUUID(), state))
if fully_readable and change_list:
logging.warning(self._first_outdated_message)
return change_list
......
......@@ -65,6 +65,7 @@ UNIT_TEST_MODULES = [
'neo.tests.client.testZODBURI',
# light functional tests
'neo.tests.threaded.test',
'neo.tests.threaded.testConfig',
'neo.tests.threaded.testImporter',
'neo.tests.threaded.testReplication',
'neo.tests.threaded.testSSL',
......
......@@ -71,6 +71,7 @@ class Application(BaseApplication):
self.dm.setup(reset=config.getReset(), dedup=config.getDedup())
self.loadConfiguration()
self.devpath = self.dm.getTopologyPath()
# force node uuid from command line argument, for testing purpose only
if config.getUUID() is not None:
......@@ -203,7 +204,8 @@ class Application(BaseApplication):
pt = self.pt
# search, find, connect and identify to the primary master
bootstrap = BootstrapManager(self, NodeTypes.STORAGE, self.server)
bootstrap = BootstrapManager(self, NodeTypes.STORAGE, self.server,
self.devpath)
self.master_node, self.master_conn, num_partitions, num_replicas = \
bootstrap.getPrimaryConnection()
uuid = self.uuid
......
......@@ -51,7 +51,7 @@ class Checker(object):
else:
conn = ClientConnection(app, StorageOperationHandler(app), node)
conn.ask(Packets.RequestIdentification(NodeTypes.STORAGE,
uuid, app.server, name, app.id_timestamp))
uuid, app.server, name, (), app.id_timestamp))
self.conn_dict[conn] = node.isIdentified()
conn_set = set(self.conn_dict)
conn_set.discard(None)
......
......@@ -16,8 +16,6 @@
LOG_QUERIES = False
from neo.lib.exception import DatabaseFailure
DATABASE_MANAGER_DICT = {
'Importer': 'importer.ImporterDatabaseManager',
'MySQL': 'mysqldb.MySQLDatabaseManager',
......@@ -33,3 +31,6 @@ def getAdapterKlass(name):
def buildDatabaseManager(name, args=(), kw={}):
return getAdapterKlass(name)(*args, **kw)
class DatabaseFailure(Exception):
pass
......@@ -15,23 +15,39 @@
# along with this program. If not, see <http://www.gnu.org/licenses/>.
import os
import cPickle, pickle, time
import cPickle, pickle, sys, time
from bisect import bisect, insort
from collections import deque
from cStringIO import StringIO
from ConfigParser import SafeConfigParser
from ZODB.config import storageFromString
from ZConfig import loadConfigFile
from ZODB import BaseStorage
from ZODB.config import getStorageSchema, storageFromString
from ZODB.POSException import POSKeyError
from . import buildDatabaseManager
try:
from ZODB._compat import dumps, loads, _protocol
except ImportError:
from cPickle import dumps, loads
_protocol = 1
from ZODB.FileStorage import FileStorage
from . import buildDatabaseManager, DatabaseFailure
from .manager import DatabaseManager
from neo.lib import logging, patch, util
from neo.lib.exception import DatabaseFailure
from neo.lib import compress, logging, patch, util
from neo.lib.interfaces import implements
from neo.lib.protocol import BackendNotImplemented, MAX_TID
patch.speedupFileStorageTxnLookup()
FORK = sys.platform != 'win32'
def transactionAsTuple(txn):
ext = txn.extension
return (txn.user, txn.description,
dumps(ext, _protocol) if ext else '',
txn.status == 'p', txn.tid)
class Reference(object):
__slots__ = "value",
......@@ -187,12 +203,37 @@ class ZODB(object):
def __getstate__(self):
state = self.__dict__.copy()
del state["data_tid"], state["storage"]
del state["_connect"], state["data_tid"], state["storage"]
return state
def connect(self, storage):
self.data_tid = {}
self.storage = storageFromString(storage)
config, _ = loadConfigFile(getStorageSchema(), StringIO(storage))
section = config.storage
def _connect():
self.storage = section.open()
self._connect = _connect
config = section.config
if 'read_only' in config.getSectionAttributes():
has_next_oid = config.read_only = hasattr(self, 'next_oid')
if not has_next_oid:
import gc
# This will reopen read-only as soon as we know the last oid.
def new_oid():
del self.new_oid
new_oid = self.storage.new_oid()
self.storage.close()
# A FileStorage index can be huge, and close() does not
# delete it. Stop reference it before loading it again,
# to avoid having it twice in memory.
del self.storage
gc.collect() # to be sure (maybe only required for PyPy,
# if one day we support it)
config.read_only = True
self._connect()
return new_oid
self.new_oid = new_oid
self._connect()
def setup(self, zodb_dict, shift_oid=0):
self.shift_oid = shift_oid
......@@ -259,13 +300,35 @@ class ZODB(object):
class ZODBIterator(object):
def __init__(self, zodb, *args, **kw):
iterator = zodb.iterator(*args, **kw)
def __new__(cls, zodb_list, *args):
def _init(zodb):
self = object.__new__(cls)
iterator = zodb.iterator(*args)
def _next():
self.transaction = next(iterator)
_next()
self.zodb = zodb
self.next = _next
return self
def init(zodb):
# FileStorage is fork-safe and in case we don't start iteration
# from the beginning, we want the tid index built at most once
# (by speedupFileStorageTxnLookup).
if FORK and not isinstance(zodb.storage, FileStorage):
def init():
zodb._connect()
return _init(zodb)
return init
return _init(zodb)
def result(zodb_list):
for self in zodb_list:
if callable(self):
self = self()
try:
self.next()
yield self
except StopIteration:
pass
return result(map(init, zodb_list))
tid = property(lambda self: self.transaction.tid)
......@@ -274,15 +337,18 @@ class ZODBIterator(object):
and self.zodb.shift_oid < other.zodb.shift_oid
is_true = ('false', 'true').index
class ImporterDatabaseManager(DatabaseManager):
"""Proxy that transparently imports data from a ZODB storage
"""
_writeback = None
_last_commit = 0
def __init__(self, *args, **kw):
super(ImporterDatabaseManager, self).__init__(*args, **kw)
implements(self, """_getNextTID checkSerialRange checkTIDRange
deleteObject deleteTransaction dropPartitions getLastTID
deleteObject deleteTransaction dropPartitions _getLastTID
getReplicationObjectList _getTIDList nonempty""".split())
_getPartition = property(lambda self: self.db._getPartition)
......@@ -294,30 +360,58 @@ class ImporterDatabaseManager(DatabaseManager):
config.read(os.path.expanduser(database))
sections = config.sections()
# XXX: defaults copy & pasted from elsewhere - refactoring needed
main = {'adapter': 'MySQL', 'wait': 0}
main = self._conf = {'adapter': 'MySQL', 'wait': 0}
main.update(config.items(sections.pop(0)))
self.zodb = ((x, dict(config.items(x))) for x in sections)
self.compress = main.get('compress', 1)
self.db = buildDatabaseManager(main['adapter'],
(main['database'], main.get('engine'), main['wait']))
self.zodb = [(x, dict(config.items(x))) for x in sections]
x = main.get('compress', 'true')
try:
self.compress = bool(is_true(x))
except ValueError:
self.compress = compress.parseOption(x)
if is_true(main.get('writeback', 'false')):
if len(self.zodb) > 1:
raise Exception(
"Can not forward new transactions to splitted DB.")
self._writeback = self.zodb[0][1]['storage']
def _connect(self):
conf = self._conf
db = self.db = buildDatabaseManager(conf['adapter'],
(conf['database'], conf.get('engine'), conf['wait']))
for x in """getConfiguration _setConfiguration setNumPartitions
query erase getPartitionTable changePartitionTable
getUnfinishedTIDDict dropUnfinishedData abortTransaction
storeTransaction lockTransaction unlockTransaction
query erase getPartitionTable _iterAssignedCells
updateCellTID getUnfinishedTIDDict dropUnfinishedData
abortTransaction storeTransaction lockTransaction
loadData storeData getOrphanList _pruneData deferCommit
dropPartitionsTemporary
_getDevPath dropPartitionsTemporary
""".split():
setattr(self, x, getattr(self.db, x))
setattr(self, x, getattr(db, x))
if self._writeback:
self._writeback = WriteBack(db, self._writeback)
db_commit = db.commit
def commit():
db_commit()
self._last_commit = time.time()
if self._writeback:
self._writeback.committed()
self.commit = db.commit = commit
def _connect(self):
pass
def _updateReadable(self):
raise AssertionError
def commit(self):
self.db.commit()
# XXX: This misses commits done internally by self.db (lockTransaction).
self._last_commit = time.time()
def changePartitionTable(self, *args, **kw):
self.db.changePartitionTable(*args, **kw)
if self._writeback:
self._writeback.changed()
def unlockTransaction(self, *args):
self.db.unlockTransaction(*args)
if self._writeback:
self._writeback.changed()
def close(self):
if self._writeback:
self._writeback.close()
self.db.close()
if isinstance(self.zodb, list): # _setup called
for zodb in self.zodb:
......@@ -343,6 +437,7 @@ class ImporterDatabaseManager(DatabaseManager):
zodb = self.zodb[-1]
self.zodb_loid = zodb.shift_oid + zodb.next_oid - 1
self.zodb_tid = self.db.getLastTID(self.zodb_ltid) or 0
if callable(self._import):
self._import = self._import()
def doOperation(self, app):
......@@ -352,83 +447,100 @@ class ImporterDatabaseManager(DatabaseManager):
def _import(self):
p64 = util.p64
u64 = util.u64
tid = p64(self.zodb_tid + 1)
zodb_list = []
for zodb in self.zodb:
try:
zodb_list.append(ZODBIterator(zodb, tid, p64(self.zodb_ltid)))
except StopIteration:
pass
tid = None
def finish():
if tid:
self.storeTransaction(tid, object_list, (
(x[0] for x in object_list),
str(txn.user), str(txn.description),
cPickle.dumps(txn.extension),
txn.status == 'p', tid),
tid = p64(self.zodb_tid + 1) if self.zodb_tid else None
zodb_list = ZODBIterator(self.zodb, tid, p64(self.zodb_ltid))
if FORK:
from multiprocessing import Process
from ..shared_queue import Queue
queue = Queue(1<<24)
process = self._import_process = Process(
target=lambda zodb_list: queue(self._iter_zodb(zodb_list)),
args=(zodb_list,))
process.daemon = True
process.start()
else:
queue = self._iter_zodb(zodb_list)
process = None
del zodb_list
object_list = []
data_id_list = []
for txn in queue:
if txn is None:
break
if len(txn) == 3:
oid, data_id, data_tid = txn
if data_id is not None:
checksum, data, compression = data_id
data_id = self.holdData(checksum, oid, data, compression)
data_id_list.append(data_id)
object_list.append((oid, data_id, data_tid))
# Give the main loop the opportunity to process requests
# from other nodes. In particular, clients may commit. If the
# storage node exits after such commit, and before we actually
# update 'obj' with 'object_list', some rows in 'data' may be
# unreferenced. This is not a problem because the leak is
# solved when resuming the migration.
# XXX: The leak was solved by the deduplication,
# but it was disabled by default.
else:
tid = txn[-1]
self.storeTransaction(tid, object_list,
((x[0] for x in object_list),) + txn,
False)
self.releaseData(data_id_list)
logging.debug("TXN %s imported (user=%r, desc=%r, len(oid)=%s)",
util.dump(tid), txn.user, txn.description, len(object_list))
util.dump(tid), txn[0], txn[1], len(object_list))
del object_list[:], data_id_list[:]
if self._last_commit + 1 < time.time():
self.commit()
self.zodb_tid = u64(tid)
if self.compress:
from zlib import compress
else:
compress = None
compression = 0
object_list = []
data_id_list = []
while zodb_list:
yield
if process:
process.join()
self.commit()
logging.warning("All data are imported. You should change"
" your configuration to use the native backend and restart.")
self._import = None
for x in """getObject getReplicationTIDList getReplicationObjectList
""".split():
setattr(self, x, getattr(self.db, x))
def _iter_zodb(self, zodb_list):
util.setproctitle('neostorage: import')
p64 = util.p64
u64 = util.u64
zodb_list = list(zodb_list)
if zodb_list:
tid = None
_compress = compress.getCompress(self.compress)
while 1:
zodb_list.sort()
z = zodb_list[0]
# Merge transactions with same tid. Only
# user/desc/ext from first ZODB are kept.
if tid != z.tid:
finish()
txn = z.transaction
tid = txn.tid
yield
if tid:
yield txn
txn = transactionAsTuple(z.transaction)
tid = txn[-1]
zodb = z.zodb
for r in z.transaction:
oid = p64(u64(r.oid) + zodb.shift_oid)
data_tid = r.data_txn
if data_tid or r.data is None:
data_id = None
data = None
else:
data = zodb.repickle(r.data)
if compress:
compressed_data = compress(data)
compression = len(compressed_data) < len(data)
if compression:
data = compressed_data
checksum = util.makeChecksum(data)
data_id = self.holdData(util.makeChecksum(data), oid, data,
compression)
data_id_list.append(data_id)
object_list.append((oid, data_id, data_tid))
# Give the main loop the opportunity to process requests
# from other nodes. In particular, clients may commit. If the
# storage node exits after such commit, and before we actually
# update 'obj' with 'object_list', some rows in 'data' may be
# unreferenced. This is not a problem because the leak is
# solved when resuming the migration.
yield
_, compression, data = _compress(zodb.repickle(r.data))
data = util.makeChecksum(data), data, compression
yield oid, data, data_tid
try:
z.next()
except StopIteration:
del zodb_list[0]
self._last_commit = 0
finish()
logging.warning("All data are imported. You should change"
" your configuration to use the native backend and restart.")
self._import = None
for x in """getObject getReplicationTIDList getReplicationObjectList
""".split():
setattr(self, x, getattr(self.db, x))
if not zodb_list:
break
yield txn
yield
def inZodb(self, oid, tid=None, before_tid=None):
return oid <= self.zodb_loid and (
......@@ -440,8 +552,8 @@ class ImporterDatabaseManager(DatabaseManager):
return zodb, oid - zodb.shift_oid
def getLastIDs(self):
tid, _, _, oid = self.db.getLastIDs()
return (max(tid, util.p64(self.zodb_ltid)), None, None,
tid, oid = self.db.getLastIDs()
return (max(tid, util.p64(self.zodb_ltid)),
max(oid, util.p64(self.zodb_loid)))
def getObject(self, oid, tid=None, before_tid=None):
......@@ -479,7 +591,7 @@ class ImporterDatabaseManager(DatabaseManager):
checksum = util.makeChecksum(value)
else:
# CAVEAT: Although we think loadBefore should not return an empty
# value for a deleted object (see comment in NEO Storage),
# value for a deleted object (BBB: fixed in ZODB4),
# there's no need to distinguish this case in the above
# except clause because it would be crazy to import a
# NEO DB using this backend.
......@@ -499,20 +611,19 @@ class ImporterDatabaseManager(DatabaseManager):
p64 = util.p64
shift_oid = zodb.shift_oid
return ([p64(u64(x.oid) + shift_oid) for x in txn],
txn.user, txn.description,
cPickle.dumps(txn.extension), 0, tid)
) + transactionAsTuple(txn)
else:
return self.db.getTransaction(tid, all)
def getFinalTID(self, ttid):
if u64(ttid) <= self.zodb_ltid and self._import:
if util.u64(ttid) <= self.zodb_ltid and self._import:
raise NotImplementedError
return self.db.getFinalTID(ttid)
def _deleteRange(self, partition, min_tid=None, max_tid=None):
# Even if everything is imported, we can't truncate below
# because it would import again if we restart with this backend.
if u64(min_tid) < self.zodb_ltid:
if min_tid < self.zodb_ltid:
raise NotImplementedError
self.db._deleteRange(partition, min_tid, max_tid)
......@@ -561,3 +672,120 @@ class ImporterDatabaseManager(DatabaseManager):
def pack(self, *args, **kw):
raise BackendNotImplemented(self.pack)
class WriteBack(object):
_changed = False
_process = None
def __init__(self, db, storage):
self._db = db
self._storage = storage
def close(self):
if self._process:
self._stop.set()
self._event.set()
self._process.join()
def changed(self):
self._changed = True
def committed(self):
if self._changed:
self._changed = False
if self._process:
self._event.set()
else:
if FORK:
from multiprocessing import Process, Event
else:
from threading import Thread as Process, Event
self._event = Event()
self._idle = Event()
self._stop = Event()
self._np = self._db.getNumPartitions()
self._db = cPickle.dumps(self._db, 2)
self._process = Process(target=self._run)
self._process.daemon = True
self._process.start()
@property
def wait(self):
# For unit tests.
return self._idle.wait
def _run(self):
util.setproctitle('neostorage: write back')
self._db = cPickle.loads(self._db)
try:
@self._db.autoReconnect
def _():
# Unfortunately, copyTransactionsFrom does not abort in case
# of failure, so we have to reopen.
zodb = storageFromString(self._storage)
try:
self.min_tid = util.add64(zodb.lastTransaction(), 1)
zodb.copyTransactionsFrom(self)
finally:
zodb.close()
finally:
self._idle.set()
self._db.close()
def iterator(self):
db = self._db
np = self._np
chunk_size = max(2, 1000 // np)
offset_list = xrange(np)
while 1:
with db:
# Check the partition table at the beginning of every
# transaction. Once the import is finished and at least one
# cell is replicated, it is possible that some of this node
# get outdated. In this case, wait for the next PT change.
if np == len(db._readable_set):
while 1:
tid_list = []
loop = False
for offset in offset_list:
x = db.getReplicationTIDList(
self.min_tid, MAX_TID, chunk_size, offset)
tid_list += x
if len(x) == chunk_size:
loop = True
if tid_list:
tid_list.sort()
for tid in tid_list:
if self._stop.is_set():
return
yield TransactionRecord(db, tid)
self.min_tid = util.add64(tid, 1)
if loop:
continue
break
if not self._event.is_set():
self._idle.set()
self._event.wait()
self._idle.clear()
self._event.clear()
if self._stop.is_set():
break
class TransactionRecord(BaseStorage.TransactionRecord):
def __init__(self, db, tid):
self._oid_list, user, desc, ext, _, _ = db.getTransaction(tid)
super(TransactionRecord, self).__init__(tid, ' ', user, desc,
loads(ext) if ext else {})
self._db = db
def __iter__(self):
tid = self.tid
for oid in self._oid_list:
_, compression, _, data, data_tid = self._db.fetchObject(oid, tid)
if data is not None:
data = compress.decompress_list[compression](data)
yield BaseStorage.DataRecord(oid, tid, data, data_tid)
......@@ -17,11 +17,14 @@
import os, errno, socket, struct, sys, threading
from collections import defaultdict
from contextlib import contextmanager
from copy import copy
from functools import wraps
from neo.lib import logging, util
from neo.lib.exception import DatabaseFailure
from neo.lib.interfaces import abstract, requires
from neo.lib.protocol import CellStates, NonReadableCell, ZERO_TID
from neo.lib.protocol import CellStates, NonReadableCell, MAX_TID, ZERO_TID
from . import DatabaseFailure
READABLE = CellStates.UP_TO_DATE, CellStates.FEEDING
def lazymethod(func):
def getter(self):
......@@ -60,7 +63,7 @@ class DatabaseManager(object):
LOCKED = "error: database is locked"
_deferred = 0
_duplicating = _repairing = None
_repairing = None
def __init__(self, database, engine=None, wait=None):
"""
......@@ -75,30 +78,56 @@ class DatabaseManager(object):
# But for unit tests, we really want to never retry.
self._wait = wait or 0
self._parse(database)
self._init_attrs = tuple(self.__dict__)
self._connect()
def __getattr__(self, attr):
if self._duplicating is None:
return self.__getattribute__(attr)
value = getattr(self._duplicating, attr)
setattr(self, attr, value)
return value
def __getstate__(self):
state = {x: getattr(self, x) for x in self._init_attrs}
assert state # otherwise, __setstate__ is not called
return state
def __setstate__(self, state):
self.__dict__.update(state)
# For the moment, no need to duplicate secondary connections.
#self._init_attrs = tuple(self.__dict__)
# Secondary connections don't lock.
self.LOCK = None
self._connect()
@contextmanager
def _duplicate(self):
cls = self.__class__
db = cls.__new__(cls)
db.LOCK = None
db._duplicating = self
try:
db._connect()
finally:
del db._duplicating
db = copy(self)
try:
yield db
finally:
db.close()
def __getattr__(self, attr):
if attr in ('_readable_set', '_getPartition', '_getReadablePartition'):
self._updateReadable()
return self.__getattribute__(attr)
def _partitionTableChanged(self):
try:
del (self._readable_set,
self._getPartition,
self._getReadablePartition)
except AttributeError:
pass
def __enter__(self):
assert not self.LOCK, "not a secondary connection"
# XXX: All config caching should be done in this class,
# rather than in backend classes.
self._config.clear()
self._partitionTableChanged()
def __exit__(self, t, v, tb):
if v is None:
# Deferring commits make no sense for secondary connections.
assert not self._deferred
self._commit()
@abstract
def _parse(self, database):
"""Called during instantiation, to process database parameter."""
......@@ -107,6 +136,17 @@ class DatabaseManager(object):
def _connect(self):
"""Connect to the database"""
def autoReconnect(self, f):
"""
Placeholder for backends that may lose connection to the underlying
database: although a primary connection is reestablished transparently
when possible, secondary connections use transactions and they must
restart from the beginning.
For other backends, there's no expected transient failure so the
default implementation is to execute the given task exactly once.
"""
f()
def lock(self, db_path):
if self.LOCK:
assert self.__lock is None, self.__lock
......@@ -127,6 +167,15 @@ class DatabaseManager(object):
raise
sys.exit(self.LOCKED)
def _getDevPath(self):
"""
"""
@requires(_getDevPath)
def getTopologyPath(self):
# On Windows, st_dev only exists since Python 3.4
return socket.gethostname(), str(os.stat(self._getDevPath()).st_dev)
@abstract
def erase(self):
""""""
......@@ -147,7 +196,6 @@ class DatabaseManager(object):
"""
if reset:
self.erase()
self._readable_set = set()
self._uncommitted_data = defaultdict(int)
self._setup(dedup)
......@@ -250,10 +298,7 @@ class DatabaseManager(object):
Store the number of partitions into a database.
"""
self.setConfiguration('partitions', num_partitions)
try:
del self._getPartition, self._getReadablePartition
except AttributeError:
pass
self._partitionTableChanged()
def getNumReplicas(self):
"""
......@@ -314,52 +359,47 @@ class DatabaseManager(object):
except TypeError:
return -1
@abstract
def getPartitionTable(self, *nid):
"""Return a whole partition table as a sequence of rows. Each row
is again a tuple of an offset (row ID), the NID of a storage
node, and a cell state."""
# XXX: Consider splitting getLastIDs/_getLastIDs because
# sometimes the last oid is not wanted.
@abstract
def getLastTID(self, max_tid):
"""Return greatest tid in trans table that is <= given 'max_tid'
def _getLastTID(self, partition, max_tid=None):
"""Return tid of last transaction <= 'max_tid' in given 'partition'
Required only to import a DB using Importer backend.
max_tid must be in unpacked format.
tids are in unpacked format.
"""
Data from unassigned partitions must be ignored.
This is important because there may remain data from cells that have
been discarded, either due to --disable-drop-partitions option,
or in the future when dropping partitions is done in background
(because this is an expensive operation).
@requires(_getLastTID)
def getLastTID(self, max_tid=None):
"""Return tid of last transaction <= 'max_tid'
XXX: Given the TODO comment in getLastIDs, getting ids
from readable partitions should be enough.
tids are in unpacked format.
"""
if self.getNumPartitions():
return max(map(self._getLastTID, self._readable_set))
def _getLastIDs(self):
"""Return (trans, obj, max(oid)) where
both 'trans' and 'obj' are {partition: max(tid)}
def _getLastIDs(self, partition):
"""Return max(tid) & max(oid) for objects of given partition
Same as in getLastTID: data from unassigned partitions must be ignored.
Results are in unpacked format
"""
@requires(_getLastIDs)
def getLastIDs(self):
trans, obj, oid = self._getLastIDs()
if trans:
tid = max(trans.itervalues())
if obj:
tid = max(tid, max(obj.itervalues()))
else:
tid = max(obj.itervalues()) if obj else None
# TODO: Replication can't be resumed from the tids in 'trans' and 'obj'
# because outdated cells are writable and may contain recently
# committed data. We must save somewhere where replication was
# interrupted and return this information. For the moment, we
# tell the replicator to resume from the beginning.
trans = obj = {}
return tid, trans, obj, oid
"""Return max(tid) & max(oid) for readable data
It is important to ignore unassigned partitions because there may
remain data from cells that have been discarded, either due to
--disable-drop-partitions option, or in the future when dropping
partitions is done in background (as it is an expensive operation).
"""
x = self._readable_set
if x:
tid, oid = zip(*map(self._getLastIDs, x))
tid = max(self.getLastTID(None), max(tid))
oid = max(oid)
return (None if tid is None else util.p64(tid),
None if oid is None else util.p64(oid))
return None, None
def _getUnfinishedTIDDict(self):
""""""
......@@ -471,6 +511,22 @@ class DatabaseManager(object):
return (util.p64(serial), compression, checksum, data,
None if data_serial is None else util.p64(data_serial))
def _getPartitionTable(self):
"""Return a whole partition table as a sequence of rows. Each row
is again a tuple of an offset (row ID), the NID of a storage
node, and a cell state."""
@requires(_getPartitionTable)
def _iterAssignedCells(self):
my_nid = self.getUUID()
return ((offset, tid) for offset, nid, tid in self._getPartitionTable()
if my_nid == nid)
@requires(_getPartitionTable)
def getPartitionTable(self):
return [(offset, nid, max(0, -state))
for offset, nid, state in self._getPartitionTable()]
@contextmanager
def replicated(self, offset):
readable_set = self._readable_set
......@@ -492,11 +548,12 @@ class DatabaseManager(object):
"""
"""
@requires(_changePartitionTable, _getDataLastId)
def changePartitionTable(self, ptid, cell_list, reset=False):
readable_set = self._readable_set
if reset:
readable_set.clear()
@requires(_getDataLastId)
def _updateReadable(self):
try:
readable_set = self.__dict__['_readable_set']
except KeyError:
readable_set = self._readable_set = set()
np = self.getNumPartitions()
def _getPartition(x, np=np):
return x % np
......@@ -511,17 +568,80 @@ class DatabaseManager(object):
for p in xrange(np):
i = self._getDataLastId(p)
d.append(p << 48 if i is None else i + 1)
me = self.getUUID()
for offset, nid, state in cell_list:
if nid == me:
if CellStates.UP_TO_DATE != state != CellStates.FEEDING:
readable_set.discard(offset)
else:
readable_set.add(offset)
readable_set.clear()
readable_set.update(x[0] for x in self._iterAssignedCells()
if -x[1] in READABLE)
@requires(_changePartitionTable, _getLastIDs, _getLastTID)
def changePartitionTable(self, ptid, cell_list, reset=False):
my_nid = self.getUUID()
pt = dict(self._iterAssignedCells())
# In backup mode, the last transactions of a readable cell may be
# incomplete.
backup_tid = self.getBackupTID()
if backup_tid:
backup_tid = util.u64(backup_tid)
def outofdate_tid(offset):
tid = pt.get(offset, 0)
if tid >= 0:
return tid
return -tid in READABLE and (backup_tid or
max(self._getLastIDs(offset)[0],
self._getLastTID(offset))) or 0
cell_list = [(offset, nid, (
None if state == CellStates.DISCARDED else
-state if nid != my_nid or state != CellStates.OUT_OF_DATE else
outofdate_tid(offset)))
for offset, nid, state in cell_list]
self._changePartitionTable(cell_list, reset)
self._updateReadable()
assert isinstance(ptid, (int, long)), ptid
self._setConfiguration('ptid', str(ptid))
@requires(_changePartitionTable)
def updateCellTID(self, partition, tid):
t, = (t for p, t in self._iterAssignedCells() if p == partition)
if t < 0:
return
tid = util.u64(tid)
# Replicator doesn't optimize when there's no new data
# since the node went down.
if t == tid:
return
# In a backup cluster, when a storage node gets down just after
# being the first to replicate fully new transactions from upstream,
# we may end up in a special situation where an OUT_OF_DATE cell
# is actually more up-to-date than an UP_TO_DATE one.
assert t < tid or self.getBackupTID()
self._changePartitionTable([(partition, self.getUUID(), tid)])
def iterCellNextTIDs(self):
p64 = util.p64
backup_tid = self.getBackupTID()
if backup_tid:
next_tid = util.u64(backup_tid)
if next_tid:
next_tid += 1
for offset, tid in self._iterAssignedCells():
if tid >= 0: # OUT_OF_DATE
yield offset, p64(tid and tid + 1)
elif -tid in READABLE:
if backup_tid:
# An UP_TO_DATE cell does not have holes so it's fine to
# resume from the last found records.
tid = self._getLastTID(offset)
yield offset, (
# For trans, a transaction can't be partially
# replicated, so replication can resume from the next
# possible tid.
p64(max(next_tid, tid + 1) if tid else next_tid),
# For obj, the last transaction may be partially
# replicated so it must be checked again (i.e. no +1).
p64(max(next_tid, self._getLastIDs(offset)[0])))
else:
yield offset, None
@abstract
def dropPartitions(self, offset_list):
"""Delete all data for specified partitions"""
......@@ -717,7 +837,7 @@ class DatabaseManager(object):
"""
@abstract
def unlockTransaction(self, tid, ttid):
def unlockTransaction(self, tid, ttid, trans, obj):
"""Finalize a transaction by moving data to a finished area."""
@abstract
......@@ -741,9 +861,16 @@ class DatabaseManager(object):
def truncate(self):
tid = self.getTruncateTID()
if tid:
assert tid != ZERO_TID, tid
for partition in xrange(self.getNumPartitions()):
tid = util.u64(tid)
assert tid, tid
cell_list = []
my_nid = self.getUUID()
for partition, state in self._iterAssignedCells():
if state > tid:
cell_list.append((partition, my_nid, tid))
self._deleteRange(partition, tid)
if cell_list:
self._changePartitionTable(cell_list)
self._setTruncateTID(None)
self.commit()
......
......@@ -16,6 +16,7 @@
from binascii import a2b_hex
from collections import OrderedDict
from functools import wraps
import MySQLdb
from MySQLdb import DataError, IntegrityError, \
OperationalError, ProgrammingError
......@@ -33,24 +34,63 @@ import struct
import sys
import time
from . import LOG_QUERIES
from . import LOG_QUERIES, DatabaseFailure
from .manager import DatabaseManager, splitOIDField
from neo.lib import logging, util
from neo.lib.exception import DatabaseFailure
from neo.lib.interfaces import implements
from neo.lib.protocol import CellStates, ZERO_OID, ZERO_TID, ZERO_HASH
from neo.lib.protocol import ZERO_OID, ZERO_TID, ZERO_HASH
class MysqlError(DatabaseFailure):
def __init__(self, exc, query=None):
self.exc = exc
self.query = query
code = property(lambda self: self.exc.args[0])
def __str__(self):
msg = 'MySQL error %s: %s' % self.exc.args
return msg if self.query is None else '%s\nQuery: %s' % (
msg, getPrintableQuery(self.query[:1000]))
def getPrintableQuery(query, max=70):
return ''.join(c if c in string.printable and c not in '\t\x0b\x0c\r'
else '\\x%02x' % ord(c) for c in query)
def auto_reconnect(wrapped):
def wrapper(self, *args):
# Try 3 times at most. When it fails too often for the same
# query then the disconnection is likely caused by this query.
# We don't want to enter into an infinite loop.
retry = 2
while 1:
try:
return wrapped(self, *args)
except OperationalError as m:
# IDEA: Is it safe to retry in case of DISK_FULL ?
# XXX: However, this would another case of failure that would
# be unnoticed by other nodes (ADMIN & MASTER). When
# there are replicas, it may be preferred to not retry.
if (self._active
or SERVER_GONE_ERROR != m.args[0] != SERVER_LOST
or not retry):
if self.LOCK:
raise MysqlError(m, *args)
raise # caught upper for secondary connections
logging.info('the MySQL server is gone; reconnecting')
assert not self._deferred
self.close()
retry -= 1
return wraps(wrapped)(wrapper)
@implements
class MySQLDatabaseManager(DatabaseManager):
"""This class manages a database on MySQL."""
VERSION = 2
VERSION = 3
ENGINES = "InnoDB", "RocksDB", "TokuDB"
_engine = ENGINES[0] # default engine
......@@ -65,9 +105,18 @@ class MySQLDatabaseManager(DatabaseManager):
'(?:([^:]+)(?::(.*))?@)?([^~./]+)(.+)?$', database).groups()
def _close(self):
self.conn.close()
try:
conn = self.__dict__.pop('conn')
except KeyError:
return
conn.close()
def __getattr__(self, attr):
if attr == 'conn':
self._tryConnect()
return super(MySQLDatabaseManager, self).__getattr__(attr)
def _connect(self):
def _tryConnect(self):
kwd = {'db' : self.db, 'user' : self.user}
if self.passwd is not None:
kwd['passwd'] = self.passwd
......@@ -75,6 +124,7 @@ class MySQLDatabaseManager(DatabaseManager):
kwd['unix_socket'] = os.path.expanduser(self.socket)
logging.info('connecting to MySQL on the database %s with user %s',
self.db, self.user)
self._active = 0
if self._wait < 0:
timeout_at = None
else:
......@@ -95,7 +145,6 @@ class MySQLDatabaseManager(DatabaseManager):
log = logging.exception
log('Connection to MySQL failed, retrying.')
time.sleep(1)
self._active = 0
self._config = {}
conn = self.conn
conn.autocommit(False)
......@@ -117,23 +166,48 @@ class MySQLDatabaseManager(DatabaseManager):
" Minimal value must be %uk."
% (name, self._max_allowed_packet // 1024))
self._max_allowed_packet = int(value)
try:
self._dedup = bool(query(
"SHOW INDEX FROM data WHERE key_name='hash'"))
except ProgrammingError as e:
if e.args[0] != NO_SUCH_TABLE:
raise
self._dedup = None
if not self.LOCK:
# Prevent automatic reconnection for secondary connections.
self._active = 1
self._commit = self.conn.commit
_connect = auto_reconnect(_tryConnect)
def autoReconnect(self, f):
assert self._active and not self.LOCK
@auto_reconnect
def try_once(self):
if self._active:
try:
f()
finally:
self._active = 0
return True
while not try_once(self):
# Avoid reconnecting too often.
# Since this is used to wrap an arbitrary long process and
# not just a single query, we can't limit the number of retries.
time.sleep(5)
self._connect()
def _commit(self):
self.conn.commit()
self._active = 0
@auto_reconnect
def query(self, query):
"""Query data from a database."""
if LOG_QUERIES:
logging.debug('querying %s...',
getPrintableQuery(query.split('\n', 1)[0][:70]))
# Try 3 times at most. When it fails too often for the same
# query then the disconnection is likely caused by this query.
# We don't want to enter into an infinite loop.
retry = 2
while 1:
conn = self.conn
try:
conn.query(query)
if query.startswith("SELECT "):
r = conn.store_result()
......@@ -141,20 +215,6 @@ class MySQLDatabaseManager(DatabaseManager):
tuple([d.tostring() if isinstance(d, array) else d
for d in row])
for row in r.fetch_row(r.num_rows())])
break
except OperationalError as m:
code, m = m.args
# IDEA: Is it safe to retry in case of DISK_FULL ?
# XXX: However, this would another case of failure that would
# be unnoticed by other nodes (ADMIN & MASTER). When
# there are replicas, it may be preferred to not retry.
if self._active or SERVER_GONE_ERROR != code != SERVER_LOST \
or not retry:
raise DatabaseFailure('MySQL error %d: %s\nQuery: %s'
% (code, m, getPrintableQuery(query[:1000])))
logging.info('the MySQL server is gone; reconnecting')
self._connect()
retry -= 1
r = query.split(None, 1)[0]
if r in ("INSERT", "REPLACE", "DELETE", "UPDATE"):
self._active = 1
......@@ -166,6 +226,11 @@ class MySQLDatabaseManager(DatabaseManager):
"""Escape special characters in a string."""
return self.conn.escape_string
def _getDevPath(self):
# BBB: MySQL is moving to Performance Schema.
return self.query("SELECT * FROM information_schema.global_variables"
" WHERE variable_name='datadir'")[0][1]
def erase(self):
self.query("DROP TABLE IF EXISTS"
" config, pt, trans, obj, data, bigdata, ttrans, tobj")
......@@ -177,20 +242,33 @@ class MySQLDatabaseManager(DatabaseManager):
if e.args[0] != NO_SUCH_TABLE:
raise
def _alterTable(self, schema_dict, table, select="*"):
q = self.query
new = 'new_' + table
if self.nonempty(table) is None:
if self.nonempty(new) is None:
return
else:
q("DROP TABLE IF EXISTS " + new)
q(schema_dict.pop(table) % new
+ " SELECT %s FROM %s" % (select, table))
q("DROP TABLE " + table)
q("ALTER TABLE %s RENAME TO %s" % (new, table))
def _migrate1(self, _):
self._checkNoUnfinishedTransactions()
self.query("DROP TABLE IF EXISTS ttrans")
def _migrate2(self, schema_dict):
q = self.query
if self.nonempty('obj') is None:
if self.nonempty('new_obj') is None:
return
else:
q("DROP TABLE IF EXISTS new_obj")
q(schema_dict.pop('obj') % 'new_obj' + " SELECT * FROM obj")
q("DROP TABLE obj")
q("ALTER TABLE new_obj RENAME TO obj")
self._alterTable(schema_dict, 'obj')
def _migrate3(self, schema_dict):
self._alterTable(schema_dict, 'pt', "rid as `partition`, nid,"
" CASE state"
" WHEN 0 THEN -1" # UP_TO_DATE
" WHEN 2 THEN -2" # FEEDING
" ELSE 1-state"
" END as tid")
def _setup(self, dedup=False):
self._config.clear()
......@@ -207,10 +285,10 @@ class MySQLDatabaseManager(DatabaseManager):
# The table "pt" stores a partition table.
schema_dict['pt'] = """CREATE TABLE %s (
rid INT UNSIGNED NOT NULL,
`partition` SMALLINT UNSIGNED NOT NULL,
nid INT NOT NULL,
state TINYINT UNSIGNED NOT NULL,
PRIMARY KEY (rid, nid)
tid BIGINT NOT NULL,
PRIMARY KEY (`partition`, nid)
) ENGINE=""" + engine
if self._use_partition:
......@@ -292,6 +370,9 @@ class MySQLDatabaseManager(DatabaseManager):
for table, schema in schema_dict.iteritems():
q(schema % ('IF NOT EXISTS ' + table))
if self._dedup is None:
self._dedup = dedup
self._uncommitted_data.update(q("SELECT data_id, count(*)"
" FROM tobj WHERE data_id IS NOT NULL GROUP BY data_id"))
......@@ -326,42 +407,23 @@ class MySQLDatabaseManager(DatabaseManager):
q("ALTER TABLE config MODIFY value VARBINARY(%s) NULL" % len(value))
q(sql)
def getPartitionTable(self, *nid):
if nid:
return self.query("SELECT rid, state FROM pt WHERE nid=%u" % nid)
def _getPartitionTable(self):
return self.query("SELECT * FROM pt")
def _getAssignedPartitionList(self):
nid = self.getUUID()
if nid is None:
return ()
return [p for p, in self.query("SELECT rid FROM pt WHERE nid=%s" % nid)]
def _sqlmax(self, sql, arg_list):
q = self.query
x = [x for x in arg_list for x, in q(sql % x) if x is not None]
if x: return max(x)
def getLastTID(self, max_tid):
return self._sqlmax(
"SELECT MAX(tid) as t FROM trans FORCE INDEX (PRIMARY)"
" WHERE tid<=%s and `partition`=%%s" % max_tid,
self._getAssignedPartitionList())
def _getLastTID(self, partition, max_tid=None):
x = "WHERE `partition`=%s" % partition
if max_tid:
x += " AND tid<=%s" % max_tid
(tid,), = self.query(
"SELECT MAX(tid) as t FROM trans FORCE INDEX (PRIMARY)" + x)
return tid
def _getLastIDs(self):
offset_list = self._getAssignedPartitionList()
p64 = util.p64
def _getLastIDs(self, partition):
q = self.query
sql = "SELECT MAX(tid) FROM %s WHERE `partition`=%s"
trans, obj = ({partition: p64(tid)
for partition in offset_list
for tid, in q(sql % (t, partition))
if tid is not None}
for t in ('trans FORCE INDEX (PRIMARY)', 'obj FORCE INDEX (tid)'))
oid = self._sqlmax(
"SELECT MAX(oid) FROM obj FORCE INDEX (PRIMARY)"
" WHERE `partition`=%s", offset_list)
return trans, obj, None if oid is None else p64(oid)
x = "WHERE `partition`=%s" % partition
(oid,), = q("SELECT MAX(oid) FROM obj FORCE INDEX (PRIMARY)" + x)
(tid,), = q("SELECT MAX(tid) FROM obj FORCE INDEX (tid)" + x)
return tid, oid
def _getDataLastId(self, partition):
return self.query("SELECT MAX(id) FROM data WHERE %s <= id AND id < %s"
......@@ -427,26 +489,26 @@ class MySQLDatabaseManager(DatabaseManager):
q = self.query
if reset:
q("DELETE FROM pt")
for offset, nid, state in cell_list:
for offset, nid, tid in cell_list:
# TODO: this logic should move out of database manager
# add 'dropCells(cell_list)' to API and use one query
if state == CellStates.DISCARDED:
q("DELETE FROM pt WHERE rid = %d AND nid = %d"
if tid is None:
q("DELETE FROM pt WHERE `partition` = %d AND nid = %d"
% (offset, nid))
else:
offset_list.append(offset)
q("INSERT INTO pt VALUES (%d, %d, %d)"
" ON DUPLICATE KEY UPDATE state = %d"
% (offset, nid, state, state))
" ON DUPLICATE KEY UPDATE tid = %d"
% (offset, nid, tid, tid))
if self._use_partition:
for offset in offset_list:
add = """ALTER TABLE %%s ADD PARTITION (
PARTITION p%u VALUES IN (%u))""" % (offset, offset)
for table in 'trans', 'obj':
try:
self.conn.query(add % table)
except OperationalError as e:
if e.args[0] != SAME_NAME_PARTITION:
self.query(add % table)
except MysqlError as e:
if e.code != SAME_NAME_PARTITION:
raise
def dropPartitions(self, offset_list):
......@@ -468,9 +530,9 @@ class MySQLDatabaseManager(DatabaseManager):
','.join(' p%u' % i for i in offset_list)
for table in 'trans', 'obj':
try:
self.conn.query(drop % table)
except OperationalError as e:
if e.args[0] != DROP_LAST_PARTITION:
self.query(drop % table)
except MysqlError as e:
if e.code != DROP_LAST_PARTITION:
raise
def _getUnfinishedDataIdList(self):
......@@ -578,6 +640,7 @@ class MySQLDatabaseManager(DatabaseManager):
if 0x1000000 <= len(data): # 16M (MEDIUMBLOB limit)
compression |= 0x80
q = self.query
if self._dedup:
for r, d in q("SELECT id, value FROM data"
" WHERE hash='%s' AND compression=%s"
% (checksum, compression)):
......@@ -647,18 +710,21 @@ class MySQLDatabaseManager(DatabaseManager):
% (u64(tid), u64(ttid)))
self.commit()
def unlockTransaction(self, tid, ttid):
def unlockTransaction(self, tid, ttid, trans, obj):
q = self.query
u64 = util.u64
tid = u64(tid)
if trans:
q("INSERT INTO trans SELECT * FROM ttrans WHERE tid=%d" % tid)
q("DELETE FROM ttrans WHERE tid=%d" % tid)
if not obj:
return
sql = " FROM tobj WHERE tid=%d" % u64(ttid)
data_id_list = [x for x, in q("SELECT data_id%s AND data_id IS NOT NULL"
% sql)]
q("INSERT INTO obj SELECT `partition`, oid, %d, data_id, value_tid %s"
% (tid, sql))
q("DELETE" + sql)
q("INSERT INTO trans SELECT * FROM ttrans WHERE tid=%d" % tid)
q("DELETE FROM ttrans WHERE tid=%d" % tid)
self.releaseData(data_id_list)
def abortTransaction(self, ttid):
......@@ -687,10 +753,10 @@ class MySQLDatabaseManager(DatabaseManager):
def _deleteRange(self, partition, min_tid=None, max_tid=None):
sql = " WHERE `partition`=%d" % partition
if min_tid:
sql += " AND %d < tid" % util.u64(min_tid)
if max_tid:
sql += " AND tid <= %d" % util.u64(max_tid)
if min_tid is not None:
sql += " AND %d < tid" % min_tid
if max_tid is not None:
sql += " AND tid <= %d" % max_tid
q = self.query
q("DELETE FROM trans" + sql)
sql = " FROM obj" + sql
......@@ -742,7 +808,7 @@ class MySQLDatabaseManager(DatabaseManager):
compression = r[1]
if compression and compression & 0x80:
return (r[0], compression & 0x7f, r[2],
''.join(self._bigData(data)), r[4])
''.join(self._bigData(r[3])), r[4])
return r
def getReplicationObjectList(self, min_tid, max_tid, length, partition,
......@@ -886,3 +952,25 @@ class MySQLDatabaseManager(DatabaseManager):
sha1(','.join(str(x[1]) for x in r)).digest(),
p64(r[-1][1]))
return 0, ZERO_HASH, ZERO_TID, ZERO_HASH, ZERO_OID
def _cmdline(self):
for x in ('u', self.user), ('p', self.passwd), ('S', self.socket):
if x[1]:
yield '-%s%s' % x
yield self.db
def dump(self):
import subprocess
cmd = ['mysqldump', '--compact', '--hex-blob']
cmd += self._cmdline()
return subprocess.check_output(cmd)
def restore(self, sql):
import subprocess
cmd = ['mysql']
cmd += self._cmdline()
p = subprocess.Popen(cmd, stdin=subprocess.PIPE)
p.communicate(sql)
retcode = p.wait()
if retcode:
raise subprocess.CalledProcessError(retcode, cmd)
......@@ -25,7 +25,7 @@ from . import LOG_QUERIES
from .manager import DatabaseManager, splitOIDField
from neo.lib import logging, util
from neo.lib.interfaces import implements
from neo.lib.protocol import CellStates, ZERO_OID, ZERO_TID, ZERO_HASH
from neo.lib.protocol import ZERO_OID, ZERO_TID, ZERO_HASH
def unique_constraint_message(table, *columns):
c = sqlite3.connect(":memory:")
......@@ -68,7 +68,7 @@ class SQLiteDatabaseManager(DatabaseManager):
never be used for small requests.
"""
VERSION = 2
VERSION = 3
def _parse(self, database):
self.db = os.path.expanduser(database)
......@@ -86,6 +86,9 @@ class SQLiteDatabaseManager(DatabaseManager):
q("PRAGMA journal_mode = MEMORY")
self._config = {}
def _getDevPath(self):
return self.db
def _commit(self):
retry_if_locked(self.conn.commit)
......@@ -113,23 +116,33 @@ class SQLiteDatabaseManager(DatabaseManager):
if not e.args[0].startswith("no such table:"):
raise
def _migrate1(self, *_):
self._checkNoUnfinishedTransactions()
self.query("DROP TABLE IF EXISTS ttrans")
def _migrate2(self, schema_dict, index_dict):
def _alterTable(self, schema_dict, table, select="*"):
# BBB: As explained in _setup, no transactional DDL
# so let's do the same dance as for MySQL.
q = self.query
if self.nonempty('obj') is None:
if self.nonempty('new_obj') is None:
new = 'new_' + table
if self.nonempty(table) is None:
if self.nonempty(new) is None:
return
else:
q("DROP TABLE IF EXISTS new_obj")
q(schema_dict.pop('obj') % 'new_obj')
q("INSERT INTO new_obj SELECT * FROM obj")
q("DROP TABLE obj")
q("ALTER TABLE new_obj RENAME TO obj")
q("DROP TABLE IF EXISTS " + new)
q(schema_dict.pop(table) % new)
q("INSERT INTO %s SELECT %s FROM %s" % (new, select, table))
q("DROP TABLE " + table)
q("ALTER TABLE %s RENAME TO %s" % (new, table))
def _migrate1(self, *_):
self._checkNoUnfinishedTransactions()
self.query("DROP TABLE IF EXISTS ttrans")
def _migrate2(self, schema_dict, index_dict):
self._alterTable(schema_dict, 'obj')
def _migrate3(self, schema_dict, index_dict):
self._alterTable(schema_dict, 'pt', "rid, nid, CASE state"
" WHEN 0 THEN -1" # UP_TO_DATE
" WHEN 2 THEN -2" # FEEDING
" ELSE 1-state END")
def _setup(self, dedup=False):
# BBB: SQLite has transactional DDL but before Python 3.6,
......@@ -150,10 +163,10 @@ class SQLiteDatabaseManager(DatabaseManager):
# The table "pt" stores a partition table.
schema_dict['pt'] = """CREATE TABLE %s (
rid INTEGER NOT NULL,
partition INTEGER NOT NULL,
nid INTEGER NOT NULL,
state INTEGER NOT NULL,
PRIMARY KEY (rid, nid))
tid INTEGER NOT NULL,
PRIMARY KEY (partition, nid))
"""
# The table "trans" stores information on committed transactions.
......@@ -223,7 +236,8 @@ class SQLiteDatabaseManager(DatabaseManager):
for table, schema in schema_dict.iteritems():
q(schema % ('IF NOT EXISTS ' + table))
for i, index in enumerate(index_dict.get(table, ()), 1):
for table, index in index_dict.iteritems():
for i, index in enumerate(index, 1):
q(index % ('IF NOT EXISTS _%s_i%s' % (table, i), table))
self._uncommitted_data.update(q("SELECT data_id, count(*)"
......@@ -249,42 +263,23 @@ class SQLiteDatabaseManager(DatabaseManager):
else:
q("REPLACE INTO config VALUES (?,?)", (key, str(value)))
def getPartitionTable(self, *nid):
if nid:
return self.query("SELECT rid, state FROM pt WHERE nid=?", nid)
def _getPartitionTable(self):
return self.query("SELECT * FROM pt")
# A test with a table of 20 million lines and SQLite 3.8.7.1 shows that
# it's not worth changing getLastTID:
# - It already returns the result in less than 2 seconds, without reading
# the whole table (this is 4-7 times faster than MySQL).
# - Strangely, a "GROUP BY partition" clause makes SQLite almost twice
# slower.
# - Getting MAX(tid) is immediate with a "AND partition=?" condition so one
# way to speed up the following 2 methods is to repeat the queries for
# each partition (and finish in Python with max() for getLastTID).
def getLastTID(self, max_tid):
return self.query(
"SELECT MAX(tid) FROM pt, trans"
" WHERE nid=? AND rid=partition AND tid<=?",
(self.getUUID(), max_tid,)).next()[0]
def _getLastIDs(self):
p64 = util.p64
def _getLastTID(self, partition, max_tid=None):
x = self.query
if max_tid is None:
x = x("SELECT MAX(tid) FROM trans WHERE partition=?", (partition,))
else:
x = x("SELECT MAX(tid) FROM trans WHERE partition=? AND tid<=?",
(partition, max_tid))
return x.next()[0]
def _getLastIDs(self, *args):
q = self.query
args = self.getUUID(),
trans = {partition: p64(tid)
for partition, tid in q(
"SELECT partition, MAX(tid) FROM pt, trans"
" WHERE nid=? AND rid=partition GROUP BY partition", args)}
obj = {partition: p64(tid)
for partition, tid in q(
"SELECT partition, MAX(tid) FROM pt, obj"
" WHERE nid=? AND rid=partition GROUP BY partition", args)}
oid = q("SELECT MAX(oid) oid FROM pt, obj"
" WHERE nid=? AND rid=partition", args).next()[0]
return trans, obj, None if oid is None else p64(oid)
(oid,), = q("SELECT MAX(oid) FROM obj WHERE `partition`=?", args)
(tid,), = q("SELECT MAX(tid) FROM obj WHERE `partition`=?", args)
return tid, oid
def _getDataLastId(self, partition):
return self.query("SELECT MAX(id) FROM data WHERE %s <= id AND id < %s"
......@@ -352,8 +347,8 @@ class SQLiteDatabaseManager(DatabaseManager):
# whereas we try to replace only 1 value ?
# We don't want to remove the 'NOT NULL' constraint
# so we must simulate a "REPLACE OR FAIL".
q("DELETE FROM pt WHERE rid=? AND nid=?", (offset, nid))
if state != CellStates.DISCARDED:
q("DELETE FROM pt WHERE partition=? AND nid=?", (offset, nid))
if state is not None:
q("INSERT OR FAIL INTO pt VALUES (?,?,?)",
(offset, nid, int(state)))
......@@ -478,10 +473,15 @@ class SQLiteDatabaseManager(DatabaseManager):
(u64(tid), u64(ttid)))
self.commit()
def unlockTransaction(self, tid, ttid):
def unlockTransaction(self, tid, ttid, trans, obj):
q = self.query
u64 = util.u64
tid = u64(tid)
if trans:
q("INSERT INTO trans SELECT * FROM ttrans WHERE tid=?", (tid,))
q("DELETE FROM ttrans WHERE tid=?", (tid,))
if not obj:
return
ttid = u64(ttid)
sql = " FROM tobj WHERE tid=?"
data_id_list = [x for x, in q("SELECT data_id%s AND data_id IS NOT NULL"
......@@ -489,8 +489,6 @@ class SQLiteDatabaseManager(DatabaseManager):
q("INSERT INTO obj SELECT partition, oid, ?, data_id, value_tid" + sql,
(tid, ttid))
q("DELETE" + sql, (ttid,))
q("INSERT INTO trans SELECT * FROM ttrans WHERE tid=?", (tid,))
q("DELETE FROM ttrans WHERE tid=?", (tid,))
self.releaseData(data_id_list)
def abortTransaction(self, ttid):
......@@ -520,12 +518,12 @@ class SQLiteDatabaseManager(DatabaseManager):
def _deleteRange(self, partition, min_tid=None, max_tid=None):
sql = " WHERE partition=?"
args = [partition]
if min_tid:
if min_tid is not None:
sql += " AND ? < tid"
args.append(util.u64(min_tid))
if max_tid:
args.append(min_tid)
if max_tid is not None:
sql += " AND tid <= ?"
args.append(util.u64(max_tid))
args.append(max_tid)
q = self.query
q("DELETE FROM trans" + sql, args)
sql = " FROM obj" + sql
......@@ -693,3 +691,24 @@ class SQLiteDatabaseManager(DatabaseManager):
sha1(','.join(str(x[1]) for x in r)).digest(),
p64(r[-1][1]))
return 0, ZERO_HASH, ZERO_TID, ZERO_HASH, ZERO_OID
def dump(self):
main = []
data = []
for line in self.conn.iterdump():
if line.startswith('INSERT '):
assert line.endswith(';'), line
data.append(line)
continue
if line.startswith('CREATE TABLE '):
# ALTER TABLE adds quotes.
create, table, name, tail = line.split(' ', 3)
line = ' '.join((create, table, name.strip('"'), tail))
main.append(line)
assert line == 'COMMIT;', line
data.sort()
main[-1:-1] = data
return '\n'.join(main) + '\n'
def restore(self, sql):
self.conn.executescript(sql)
......@@ -42,11 +42,11 @@ class ClientOperationHandler(BaseHandler):
# for read rpc
return self.app.tm.read_queue
def askObject(self, conn, oid, serial, tid):
def askObject(self, conn, oid, at, before):
app = self.app
if app.tm.loadLocked(oid):
raise DelayEvent
o = app.dm.getObject(oid, serial, tid)
o = app.dm.getObject(oid, at, before)
try:
serial, next_serial, compression, checksum, data, data_serial = o
except TypeError:
......
......@@ -32,7 +32,7 @@ class IdentificationHandler(EventHandler):
return self.app.nm
def requestIdentification(self, conn, node_type, uuid, address, name,
id_timestamp):
devpath, id_timestamp):
self.checkClusterName(name)
app = self.app
# reject any incoming connections if not ready
......
......@@ -28,21 +28,21 @@ class InitializationHandler(BaseMasterHandler):
raise ProtocolError('Partial partition table received')
# Install the partition table into the database for persistence.
cell_list = []
offset_list = xrange(pt.getPartitions())
unassigned_set = set(offset_list)
for offset in offset_list:
unassigned = range(pt.getPartitions())
for offset in reversed(unassigned):
for cell in pt.getCellList(offset):
cell_list.append((offset, cell.getUUID(), cell.getState()))
if cell.getUUID() == app.uuid:
unassigned_set.remove(offset)
unassigned.remove(offset)
# delete objects database
dm = app.dm
if unassigned_set:
if unassigned:
if app.disable_drop_partitions:
logging.info("don't drop data for partitions %r", unassigned_set)
logging.info('partitions %r are discarded but actual deletion'
' of data is disabled', unassigned)
else:
logging.debug('drop data for partitions %r', unassigned_set)
dm.dropPartitions(unassigned_set)
logging.debug('drop data for partitions %r', unassigned)
dm.dropPartitions(unassigned)
dm.changePartitionTable(ptid, cell_list, reset=True)
dm.commit()
......@@ -63,7 +63,7 @@ class InitializationHandler(BaseMasterHandler):
def askLastIDs(self, conn):
dm = self.app.dm
dm.truncate()
ltid, _, _, loid = dm.getLastIDs()
ltid, loid = dm.getLastIDs()
conn.answer(Packets.AnswerLastIDs(loid, ltid))
def askPartitionTable(self, conn):
......@@ -77,18 +77,10 @@ class InitializationHandler(BaseMasterHandler):
def validateTransaction(self, conn, ttid, tid):
dm = self.app.dm
dm.lockTransaction(tid, ttid)
dm.unlockTransaction(tid, ttid)
dm.unlockTransaction(tid, ttid, True, True)
dm.commit()
def startOperation(self, conn, backup):
self.app.operational = True
# XXX: see comment in protocol
dm = self.app.dm
if backup:
if dm.getBackupTID():
return
tid = dm.getLastIDs()[0] or ZERO_TID
else:
tid = None
dm._setBackupTID(tid)
dm.commit()
self.app.operational = True
self.app.replicator.startOperation(backup)
......@@ -26,10 +26,7 @@ class MasterOperationHandler(BaseMasterHandler):
def startOperation(self, conn, backup):
# XXX: see comment in protocol
assert self.app.operational and backup
dm = self.app.dm
if not dm.getBackupTID():
dm._setBackupTID(dm.getLastIDs()[0] or ZERO_TID)
dm.commit()
self.app.replicator.startOperation(backup)
def askLockInformation(self, conn, ttid, tid):
self.app.tm.lock(ttid, tid)
......
......@@ -75,9 +75,6 @@ class StorageOperationHandler(EventHandler):
deleteTransaction(tid)
assert not pack_tid, "TODO"
if next_tid:
# More than one chunk ? This could be a full replication so avoid
# restarting from the beginning by committing now.
self.app.dm.commit()
self.app.replicator.fetchTransactions(next_tid)
else:
self.app.replicator.fetchObjects()
......@@ -97,15 +94,12 @@ class StorageOperationHandler(EventHandler):
for serial, oid_list in object_dict.iteritems():
for oid in oid_list:
deleteObject(oid, serial)
# XXX: It should be possible not to commit here if it was the last
# chunk, because we'll either commit again when updating
# 'backup_tid' or the partition table.
self.app.dm.commit()
assert not pack_tid, "TODO"
if next_tid:
# TODO also provide feedback to master about current replication state (tid)
self.app.replicator.fetchObjects(next_tid, next_oid)
else:
# This will also commit.
self.app.replicator.finish()
@checkConnectionIsReplicatorConnection
......@@ -267,6 +261,8 @@ class StorageOperationHandler(EventHandler):
"partition %u dropped or truncated"
% partition), msg_id)
return
if not object[2]: # creation undone
object = object[0], 0, ZERO_HASH, '', object[4]
# Same as in askFetchTransactions.
conn.send(Packets.AddObject(oid, *object), msg_id)
yield conn.buffering
......
......@@ -93,7 +93,7 @@ from neo.lib import logging
from neo.lib.protocol import CellStates, NodeTypes, NodeStates, \
Packets, INVALID_TID, ZERO_TID, ZERO_OID
from neo.lib.connection import ClientConnection, ConnectionClosed
from neo.lib.util import add64, dump
from neo.lib.util import add64, dump, p64
from .handlers.storage import StorageOperationHandler
FETCH_COUNT = 1000
......@@ -190,41 +190,50 @@ class Replicator(object):
return add64(tid, -1)
return ZERO_TID
def updateBackupTID(self):
def updateBackupTID(self, commit=False):
dm = self.app.dm
tid = dm.getBackupTID()
if tid:
new_tid = self.getBackupTID()
if tid != new_tid:
dm._setBackupTID(new_tid)
if commit:
dm.commit()
def startOperation(self, backup):
dm = self.app.dm
if backup:
if dm.getBackupTID():
assert not hasattr(self, 'partition_dict'), self.partition_dict
return
tid = dm.getLastIDs()[0] or ZERO_TID
else:
tid = None
dm._setBackupTID(tid)
dm.commit()
try:
partition_dict = self.partition_dict
except AttributeError:
return
for offset, next_tid in dm.iterCellNextTIDs():
if type(next_tid) is not bytes: # readable
p = partition_dict[offset]
p.next_trans, p.next_obj = next_tid
def populate(self):
app = self.app
pt = app.pt
uuid = app.uuid
self.partition_dict = {}
self.replicate_dict = {}
self.source_dict = {}
self.ttid_set = set()
last_tid, last_trans_dict, last_obj_dict, _ = app.dm.getLastIDs()
next_tid = app.dm.getBackupTID() or last_tid
next_tid = add64(next_tid, 1) if next_tid else ZERO_TID
outdated_list = []
for offset in xrange(pt.getPartitions()):
for cell in pt.getCellList(offset):
if cell.getUUID() == uuid and not cell.isCorrupted():
for offset, next_tid in self.app.dm.iterCellNextTIDs():
self.partition_dict[offset] = p = Partition()
if cell.isOutOfDate():
if type(next_tid) is bytes: # OUT_OF_DATE
outdated_list.append(offset)
try:
p.next_trans = add64(last_trans_dict[offset], 1)
except KeyError:
p.next_trans = ZERO_TID
p.next_obj = last_obj_dict.get(offset, ZERO_TID)
p.max_ttid = INVALID_TID
else:
p.next_trans = p.next_obj = next_tid
p.max_ttid = INVALID_TID
else: # readable
p.next_trans, p.next_obj = next_tid or (None, None)
p.max_ttid = None
if outdated_list:
self.app.tm.replicating(outdated_list)
......@@ -236,7 +245,6 @@ class Replicator(object):
discarded_list = []
readable_list = []
app = self.app
last_tid, last_trans_dict, last_obj_dict, _ = app.dm.getLastIDs()
for offset, uuid, state in cell_list:
if uuid == app.uuid:
if state in (CellStates.DISCARDED, CellStates.CORRUPTED):
......@@ -251,11 +259,9 @@ class Replicator(object):
elif state == CellStates.OUT_OF_DATE:
assert offset not in self.partition_dict
self.partition_dict[offset] = p = Partition()
try:
p.next_trans = add64(last_trans_dict[offset], 1)
except KeyError:
p.next_trans = ZERO_TID
p.next_obj = last_obj_dict.get(offset, ZERO_TID)
# New cell. 0 is also what should be stored by the backend.
# Nothing to optimize.
p.next_trans = p.next_obj = ZERO_TID
p.max_ttid = INVALID_TID
added_list.append(offset)
else:
......@@ -289,7 +295,7 @@ class Replicator(object):
next_tid = add64(tid, 1)
p.next_trans = p.next_obj = next_tid
if next_tid:
self.updateBackupTID()
self.updateBackupTID(True)
self._nextPartition()
def _nextPartitionSortKey(self, offset):
......@@ -344,7 +350,7 @@ class Replicator(object):
try:
conn.ask(Packets.RequestIdentification(NodeTypes.STORAGE,
None if name else app.uuid, app.server, name or app.name,
app.id_timestamp))
(), app.id_timestamp))
except ConnectionClosed:
if previous_node is self.current_node:
return
......@@ -360,6 +366,9 @@ class Replicator(object):
offset = self.current_partition
p = self.partition_dict[offset]
if min_tid:
# More than one chunk ? This could be a full replication so avoid
# restarting from the beginning by committing now.
self.app.dm.commit()
p.next_trans = min_tid
else:
try:
......@@ -384,13 +393,17 @@ class Replicator(object):
offset = self.current_partition
p = self.partition_dict[offset]
max_tid = self.replicate_tid
dm = self.app.dm
if min_tid:
p.next_obj = min_tid
self.updateBackupTID()
dm.updateCellTID(offset, add64(min_tid, -1))
dm.commit() # like in fetchTransactions
else:
min_tid = p.next_obj
p.next_trans = add64(max_tid, 1)
object_dict = {}
for serial, oid in self.app.dm.getReplicationObjectList(min_tid,
for serial, oid in dm.getReplicationObjectList(min_tid,
max_tid, FETCH_COUNT, offset, min_oid):
try:
object_dict[serial].append(oid)
......@@ -406,11 +419,14 @@ class Replicator(object):
p = self.partition_dict[offset]
p.next_obj = add64(tid, 1)
self.updateBackupTID()
app = self.app
app.dm.updateCellTID(offset, tid)
app.dm.commit()
if p.max_ttid or offset in self.replicate_dict and \
offset not in self.source_dict:
logging.debug("unfinished transactions: %r", self.ttid_set)
else:
self.app.tm.replicated(offset, tid)
app.tm.replicated(offset, tid)
logging.debug("partition %u replicated up to %s from %r",
offset, dump(tid), self.current_node)
self.getCurrentConnection().setReconnectionNoDelay()
......
#
# Copyright (C) 2018 Nexedi SA
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
from msgpack import Packer, Unpacker
class Queue(object):
"""Unidirectional pipe for asynchronous and fast exchange of big amounts
of data between 2 processes.
It is implemented using shared memory, a few locks and msgpack
serialization. While the latter is faster than C pickle, it was mainly
chosen for its streaming API while deserializing, which greatly reduces
the locking overhead for the consumer process.
There is no mechanism to end a communication, so this information must be
in the exchanged data, for example by choosing a marker object like None:
- the last object sent by the producer is this marker
- the consumer stops iterating when it gets this marker
As long as there are data being exchanged, the 2 processes can't change
roles (producer/consumer).
"""
def __init__(self, max_size):
from multiprocessing import Lock, RawArray, RawValue
self._max_size = max_size
self._array = RawArray('c', max_size)
self._pos = RawValue('L')
self._size = RawValue('L')
self._locks = Lock(), Lock(), Lock()
def __repr__(self):
return "<%s pos=%s size=%s max_size=%s>" % (self.__class__.__name__,
self._pos.value, self._size.value, self._max_size)
def __iter__(self):
"""Iterate endlessly over all objects sent by the producer
Internally, this method uses a receiving buffer that is lost if
interrupted (GeneratorExit). If this buffer was not empty, the queue
is left in a inconsistent state and this method can't be called again.
So the correct way to split a loop is to first get an iterator
explicitly:
iq = iter(queue)
for x in iq:
if ...:
break
for x in iq:
...
"""
unpacker = Unpacker(use_list=False, raw=True)
feed = unpacker.feed
max_size = self._max_size
array = self._array
pos = self._pos
size = self._size
lock, get_lock, put_lock = self._locks
left = 0
while 1:
for data in unpacker:
yield data
while 1:
with lock:
p = pos.value
s = size.value
if s:
break
get_lock.acquire()
e = p + s
if e < max_size:
feed(array[p:e])
else:
feed(array[p:])
e -= max_size
feed(array[:e])
with lock:
pos.value = e
n = size.value
size.value = n - s
if n == max_size:
put_lock.acquire(0)
put_lock.release()
def __call__(self, iterable):
"""Fill the queue with given objects
Hoping than msgpack.Packer gets a streaming API, 'iterable' should not
be split (i.e. this method should be called only once, like __iter__).
"""
pack = Packer(use_bin_type=True).pack
max_size = self._max_size
array = self._array
pos = self._pos
size = self._size
lock, get_lock, put_lock = self._locks
left = 0
for data in iterable:
data = pack(data)
n = len(data)
i = 0
while 1:
if not left:
while 1:
with lock:
p = pos.value
j = size.value
left = max_size - j
if left:
break
put_lock.acquire()
p += j
if p >= max_size:
p -= max_size
e = min(p + min(n, left), max_size)
j = e - p
array[p:e] = data[i:i+j]
n -= j
i += j
with lock:
p = pos.value
s = size.value
j += s
size.value = j
if not s:
get_lock.acquire(0)
get_lock.release()
p += j
if p >= max_size:
p -= max_size
left = max_size - j
if not n:
break
def test(self):
import multiprocessing, random, sys, threading
from traceback import print_tb
r = range(50)
random.shuffle(r)
for P in threading.Thread, multiprocessing.Process:
q = Queue(23)
def t():
for n in xrange(len(r)):
yield '.' * n
yield
for n in r:
yield '.' * n
i = j = 0
p = P(target=q, args=(t(),))
p.daemon = 1
p.start()
try:
q = iter(q)
for i, x in enumerate(q):
if x is None:
break
self.assertEqual(x, '.' * i)
self.assertEqual(i, len(r))
for j in r:
self.assertEqual(next(q), '.' * j)
except KeyboardInterrupt:
print_tb(sys.exc_info()[2])
self.fail((i, j))
p.join()
if __name__ == '__main__':
import unittest
unittest.TextTestRunner().run(type('', (unittest.TestCase,), {
'runTest': test})())
......@@ -314,12 +314,15 @@ class TransactionManager(EventQueue):
Unlock transaction
"""
try:
tid = self._transaction_dict[ttid].tid
transaction = self._transaction_dict[ttid]
except KeyError:
raise ProtocolError("unknown ttid %s" % dump(ttid))
tid = transaction.tid
logging.debug('Unlock TXN %s (ttid=%s)', dump(tid), dump(ttid))
dm = self._app.dm
dm.unlockTransaction(tid, ttid)
dm.unlockTransaction(tid, ttid,
transaction.voted == 2,
transaction.store_dict)
self._app.em.setTimeout(time() + 1, dm.deferCommit())
self.abort(ttid, even_if_locked=True)
......@@ -521,7 +524,6 @@ class TransactionManager(EventQueue):
assert not even_if_locked
# See how the master processes AbortTransaction from the client.
return
logging.debug('Abort TXN %s', dump(ttid))
transaction = self._transaction_dict[ttid]
locked = transaction.tid
# if the transaction is locked, ensure we can drop it
......@@ -529,6 +531,7 @@ class TransactionManager(EventQueue):
if not even_if_locked:
return
else:
logging.debug('Abort TXN %s', dump(ttid))
dm = self._app.dm
dm.abortTransaction(ttid)
dm.releaseData([x[1] for x in transaction.store_dict.itervalues()],
......
......@@ -28,6 +28,7 @@ import weakref
import MySQLdb
import transaction
from ConfigParser import SafeConfigParser
from cStringIO import StringIO
try:
from ZODB._compat import Unpickler
......@@ -155,8 +156,22 @@ def setupMySQLdb(db_list, user=DB_USER, password='', clear_databases=True):
conn.commit()
conn.close()
def ImporterConfigParser(adapter, zodb, **kw):
cfg = SafeConfigParser()
cfg.add_section("neo")
cfg.set("neo", "adapter", adapter)
for x in kw.iteritems():
cfg.set("neo", *x)
for name, zodb in zodb:
cfg.add_section(name)
for x in zodb.iteritems():
cfg.set(name, *x)
return cfg
class NeoTestBase(unittest.TestCase):
maxDiff = None
def setUp(self):
logging.name = self.setupLog()
unittest.TestCase.setUp(self)
......@@ -175,6 +190,8 @@ class NeoTestBase(unittest.TestCase):
# Note we don't even abort them because it may require a valid
# connection to a master node (see Storage.sync()).
transaction.manager.__init__()
if logging._max_size is not None:
logging.flush()
class failureException(AssertionError):
def __init__(self, msg=None):
......
......@@ -21,6 +21,7 @@ from .. import NeoUnitTestBase, buildUrlFromString
from neo.client.app import Application
from neo.client.cache import test as testCache
from neo.client.exception import NEOStorageError
from neo.lib.util import p64
class ClientApplicationTests(NeoUnitTestBase):
......@@ -51,9 +52,7 @@ class ClientApplicationTests(NeoUnitTestBase):
def makeOID(self, value=None):
from random import randint
if value is None:
value = randint(1, 255)
return '\00' * 7 + chr(value)
return p64(randint(1, 255) if value is None else value)
makeTID = makeOID
def makeTransactionObject(self, user='u', description='d', _extension='e'):
......
......@@ -221,7 +221,7 @@ class ClusterPdb(object):
def wait(self, test, timeout):
end_time = time() + timeout
period = 0.1
period = 0.01
while not test():
cluster_dict.acquire()
try:
......@@ -232,7 +232,6 @@ class ClusterPdb(object):
next_sleep = max(last_pdb + timeout, end_time) - time()
if next_sleep > period:
next_sleep = period
period *= 1.5
elif next_sleep < 0:
return False
finally:
......
#
# Copyright (C) 2014-2017 Nexedi SA
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
import os, stat, time
from persistent import Persistent
from BTrees.OOBTree import OOBTree
class Inode(OOBTree):
data = None
def __init__(self, up=None, mode=stat.S_IFDIR):
self[os.pardir] = self if up is None else up
self.mode = mode
self.mtime = time.time()
def __getstate__(self):
return Persistent.__getstate__(self), OOBTree.__getstate__(self)
def __setstate__(self, state):
Persistent.__setstate__(self, state[0])
OOBTree.__setstate__(self, state[1])
def edit(self, data=None, mtime=None):
fmt = stat.S_IFMT(self.mode)
if data is None:
assert fmt == stat.S_IFDIR, oct(fmt)
else:
assert fmt == stat.S_IFREG or fmt == stat.S_IFLNK, oct(fmt)
if self.data != data:
self.data = data
if self.mtime != mtime:
self.mtime = mtime or time.time()
def root(self):
try:
self = self[os.pardir]
except KeyError:
return self
return self.root()
def traverse(self, path, followlinks=True):
path = iter(path.split(os.sep) if isinstance(path, basestring) and path
else path)
for d in path:
if not d:
return self.root().traverse(path, followlinks)
if d != os.curdir:
d = self[d]
if followlinks and stat.S_ISLNK(d.mode):
d = self.traverse(d.data, True)
return d.traverse(path, followlinks)
return self
def inodeFromFs(self, path):
s = os.lstat(path)
mode = s.st_mode
name = os.path.basename(path)
try:
i = self[name]
assert stat.S_IFMT(i.mode) == stat.S_IFMT(mode)
changed = False
except KeyError:
i = self[name] = self.__class__(self, mode)
changed = True
i.edit(open(path).read() if stat.S_ISREG(mode) else
os.readlink(p) if stat.S_ISLNK(mode) else
None, s.st_mtime)
return changed or i._p_changed
def treeFromFs(self, path, yield_interval=None, filter=None):
prefix_len = len(path) + len(os.sep)
n = 0
for dirpath, dirnames, filenames in os.walk(path):
inodeFromFs = self.traverse(dirpath[prefix_len:]).inodeFromFs
for names in dirnames, filenames:
skipped = []
for j, name in enumerate(names):
p = os.path.join(dirpath, name)
if filter and not filter(p[prefix_len:]):
skipped.append(j)
elif inodeFromFs(p):
n += 1
if n == yield_interval:
n = 0
yield self
while skipped:
del names[skipped.pop()]
if n:
yield self
def walk(self):
s = [(None, self)]
while s:
top, self = s.pop()
dirs = []
nondirs = []
for name, inode in self.iteritems():
if name != os.pardir:
(dirs if stat.S_ISDIR(inode.mode) else nondirs).append(name)
yield top or os.curdir, dirs, nondirs
for name in dirs:
s.append((os.path.join(top, name) if top else name, self[name]))
......@@ -29,17 +29,16 @@ import tempfile
import traceback
import threading
import psutil
from ConfigParser import SafeConfigParser
import neo.scripts
from neo.neoctl.neoctl import NeoCTL, NotReadyException
from neo.lib import logging
from neo.lib.protocol import ClusterStates, NodeTypes, CellStates, NodeStates, \
UUID_NAMESPACES
from neo.lib.util import dump
from neo.lib.util import dump, setproctitle
from .. import (ADDRESS_TYPE, DB_SOCKET, DB_USER, IP_VERSION_FORMAT_DICT, SSL,
buildUrlFromString, cluster, getTempDirectory, NeoTestBase, Patch,
setupMySQLdb)
buildUrlFromString, cluster, getTempDirectory, setupMySQLdb,
ImporterConfigParser, NeoTestBase, Patch)
from neo.client.Storage import Storage
from neo.storage.database import manager, buildDatabaseManager
......@@ -116,36 +115,31 @@ class PortAllocator(object):
__del__ = release
class NEOProcess(object):
class Process(object):
_coverage_fd = None
_coverage_prefix = os.path.join(getTempDirectory(), 'coverage-')
_coverage_index = 0
pid = 0
def __init__(self, command, uuid, arg_dict):
try:
__import__('neo.scripts.' + command, level=0)
except ImportError:
raise NotFound, '%s not found' % (command)
def __init__(self, command, arg_dict={}):
self.command = command
self.arg_dict = arg_dict
self.with_uuid = True
self.setUUID(uuid)
def start(self, with_uuid=True):
# Prevent starting when already forked and wait wasn't called.
if self.pid != 0:
raise AlreadyRunning, 'Already running with PID %r' % (self.pid, )
command = self.command
def _args(self):
args = []
self.with_uuid = with_uuid
for arg, param in self.arg_dict.iteritems():
args.append('--' + arg)
if param is not None:
args.append(str(param))
if with_uuid:
args += '--uuid', str(self.uuid)
return args
def start(self):
# Prevent starting when already forked and wait wasn't called.
if self.pid != 0:
raise AlreadyRunning('Already running with PID %r' % self.pid)
command = self.command
args = self._args()
global coverage
if coverage:
cls = self.__class__
......@@ -159,7 +153,7 @@ class NEOProcess(object):
if args:
os.close(w)
os.kill(os.getpid(), signal.SIGSTOP)
self.pid = logging.fork()
self.pid = os.fork()
if self.pid:
# Wait that the signal to kill the child is set up.
os.close(w)
......@@ -179,7 +173,8 @@ class NEOProcess(object):
os.close(self._coverage_fd)
os.write(w, '\0')
sys.argv = [command] + args
getattr(neo.scripts, command).main()
setproctitle(self.command)
self.run()
status = 0
except SystemExit, e:
status = e.code
......@@ -203,6 +198,9 @@ class NEOProcess(object):
logging.info('pid %u: %s %s',
self.pid, command, ' '.join(map(repr, args)))
def run(self):
raise NotImplementedError
def child_coverage(self):
r = self._coverage_fd
if r is not None:
......@@ -249,11 +247,32 @@ class NEOProcess(object):
self.kill()
self.wait()
def getPID(self):
return self.pid
def isAlive(self):
try:
return psutil.Process(self.pid).status() != psutil.STATUS_ZOMBIE
except psutil.NoSuchProcess:
return False
class NEOProcess(Process):
def __init__(self, command, uuid, arg_dict):
try:
__import__('neo.scripts.' + command, level=0)
except ImportError:
raise NotFound(command + ' not found')
super(NEOProcess, self).__init__(command, arg_dict)
self.setUUID(uuid)
def _args(self):
args = super(NEOProcess, self)._args()
if self.uuid:
args += '--uuid', str(self.uuid)
return args
def run(self):
getattr(neo.scripts, self.command).main()
def getUUID(self):
assert self.with_uuid, 'UUID disabled on this process'
return self.uuid
def setUUID(self, uuid):
......@@ -262,12 +281,6 @@ class NEOProcess(object):
"""
self.uuid = uuid
def isAlive(self):
try:
return psutil.Process(self.pid).status() != psutil.STATUS_ZOMBIE
except psutil.NoSuchProcess:
return False
class NEOCluster(object):
SSL = None
......@@ -304,14 +317,8 @@ class NEOCluster(object):
IP_VERSION_FORMAT_DICT[self.address_type]
self.setupDB(clear_databases)
if importer:
cfg = SafeConfigParser()
cfg.add_section("neo")
cfg.set("neo", "adapter", adapter)
cfg = ImporterConfigParser(adapter, **importer)
cfg.set("neo", "database", self.db_template(*db_list))
for name, zodb in importer:
cfg.add_section(name)
for x in zodb.iteritems():
cfg.set(name, *x)
importer_conf = os.path.join(temp_dir, 'importer.cfg')
with open(importer_conf, 'w') as f:
cfg.write(f)
......
......@@ -202,9 +202,9 @@ class ClientTests(NEOFunctionalTest):
self.neo.stop()
self.neo = NEOCluster(db_list=['test_neo1'], partitions=3,
importer=[("root", {
importer={"zodb": [("root", {
"storage": "<filestorage>\npath %s\n</filestorage>"
% dfs_storage.getName()})],
% dfs_storage.getName()})]},
temp_dir=self.getTempDirectory())
self.neo.start()
neo_db, neo_conn = self.neo.getZODBConnection()
......
......@@ -15,7 +15,7 @@
# along with this program. If not, see <http://www.gnu.org/licenses/>.
import random, time, unittest
from collections import defaultdict
from collections import Counter, defaultdict
from .. import NeoUnitTestBase
from neo.lib import logging
from neo.lib.protocol import NodeStates, CellStates
......@@ -291,13 +291,17 @@ class MasterPartitionTableTests(NeoUnitTestBase):
self.update(pt, self.tweak(pt, sn[:1]))
self.assertPartitionTable(pt, '.U.|..U|.U.|..U|.U.|..U|.U.')
def test_18_tweak(self):
s = repr(time.time())
logging.info("using seed %r", s)
r = random.Random(s)
def test_18_tweakBigPT(self):
seed = repr(time.time())
logging.info("using seed %r", seed)
sn_count = 11
sn = [self.createStorage(None, i + 1, NodeStates.RUNNING)
for i in xrange(sn_count)]
for topo in 0, 1:
r = random.Random(seed)
if topo:
for i, s in enumerate(sn, sn_count):
s.devpath = str(i % 5),
pt = PartitionTable(1000, 2)
pt.setID(1)
for offset in xrange(pt.np):
......@@ -311,6 +315,70 @@ class MasterPartitionTableTests(NeoUnitTestBase):
self.tweak(pt)
self.update(pt)
def test_19_topology(self):
sn_count = 16
sn = [self.createStorage(None, i + 1, NodeStates.RUNNING)
for i in xrange(sn_count)]
pt = PartitionTable(48, 2)
pt.make(sn)
pt.log()
for i, s in enumerate(sn, sn_count):
s.devpath = tuple(bin(i)[3:-1])
self.assertEqual(Counter(x[2] for x in self.tweak(pt)), {
CellStates.OUT_OF_DATE: 96,
CellStates.FEEDING: 96,
})
self.update(pt)
x = lambda n, *x: ('|'.join(x[:1]*n), '|'.join(x[1:]*n))
for even, np, i, topo, expected in (
## Optimal topology.
# All nodes have same number of cells.
(1, 2, 2, ("00", "01", "02", "10", "11", "12"), ('UU...U|..UUU.',
'UU.U..|..U.UU')),
(1, 7, 1, "0001122", (
'U.....U|.U.U...|..U.U..|U....U.|.U....U|..UU...|....UU.',
'U..U...|.U...U.|..U.U..|U.....U|.U.U...|..U..U.|....U.U')),
(1, 4, 1, "00011122", ('U......U|.U.U....|..U.U...|.....UU.',
'U..U....|.U..U...|..U...U.|.....U.U')),
(1, 9, 1, "000111222", ('U.......U|.U.U.....|..U.U....|'
'.....UU..|U......U.|.U......U|'
'..UU.....|....U.U..|.....U.U.',
'U..U.....|.U....U..|..U.U....|'
'.....U.U.|U.......U|.U.U.....|'
'..U...U..|....U..U.|.....U..U')),
# Some nodes have a extra cell.
(0, 8, 1, "0001122", ('U.....U|.U.U...|..U.U..|U....U.|'
'.U....U|..UU...|....UU.|U.....U',
'U..U...|.U...U.|..U.U..|U.....U|'
'.U.U...|..U..U.|....U.U|U..U...')),
## Topology ignored.
(1, 6, 1, ("00", "01", "1"), 'UU.|U.U|.UU|UU.|U.U|.UU'),
(1, 5, 2, "01233", 'UUU..|U..UU|.UUU.|UU..U|..UUU'),
):
assert len(topo) <= sn_count
sn2 = sn[:len(topo)]
for s in sn2:
s.devpath = ()
k = (1,7)[even]
pt = PartitionTable(np*k, i)
pt.make(sn2)
for devpath, s in zip(topo, sn2):
s.devpath = tuple(devpath)
if type(expected) is tuple:
self.assertTrue(self.tweak(pt))
self.update(pt)
self.assertPartitionTable(pt, '|'.join(expected[:1]*k))
pt.clear()
pt.make(sn2)
self.assertPartitionTable(pt, '|'.join(expected[1:]*k))
self.assertFalse(pt.tweak())
else:
expected = '|'.join((expected,)*k)
self.assertFalse(pt.tweak())
self.assertPartitionTable(pt, expected)
pt.clear()
pt.make(sn2)
self.assertPartitionTable(pt, expected)
if __name__ == '__main__':
unittest.main()
......
#
# Copyright (C) 2018 Nexedi SA
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
import hashlib, random
from collections import deque
from itertools import islice
from persistent import Persistent
from BTrees.IOBTree import IOBTree
from .stat_zodb import _DummyData
def generateTree(random=random):
tree = []
N = 5
fifo = deque()
path = ()
size = lambda: max(int(random.gauss(40,30)), 0)
while 1:
tree.extend(path + (i, size())
for i in xrange(-random.randrange(N), 0))
n = N * (1 - len(path)) + random.randrange(N)
for i in xrange(n):
fifo.append(path + (i,))
try:
path = fifo.popleft()
except IndexError:
break
change = tree
while change:
change = [x[:-1] + (size(),) for x in change if random.randrange(2)]
tree += change
random.shuffle(tree)
return tree
class Leaf(Persistent):
pass
Node = IOBTree
def importTree(root, tree, yield_interval=None, filter=None):
n = 0
for path in tree:
node = root
for i, x in enumerate(path[:-1], 1):
if filter and not filter(path[:i]):
break
if x < 0:
try:
node = node[x]
except KeyError:
node[x] = node = Leaf()
node.data = bytes(_DummyData(random.Random(path), path[-1]))
else:
try:
node = node[x]
continue
except KeyError:
node[x] = node = Node()
n += 1
if n == yield_interval:
n = 0
yield root
if n:
yield root
class hashTree(object):
_hash = None
_new = hashlib.md5
def __init__(self, node):
s = [((), node)]
def walk():
h = self._new()
update = h.update
while s:
top, node = s.pop()
try:
update('%s %s %s\n' % (top, len(node.data),
self._new(node.data).hexdigest()))
yield
except AttributeError:
update('%s %s\n' % (top, tuple(node.keys())))
yield
for k, v in reversed(node.items()):
s.append((top + (k,), v))
del self._walk
self._hash = h
self._walk = walk()
def __getattr__(self, attr):
return getattr(self._hash, attr)
def __call__(self, n=None):
if n is None:
return sum(1 for _ in self._walk)
next(islice(self._walk, n - 1, None))
......@@ -19,11 +19,13 @@ PROD1 = lambda random=random: DummyZODB(6.04237779991, 1.55811487853,
1.04108991045, 0.906703192546,
0.810080409164, random)
def DummyData(random=random):
def _DummyData(random, size):
# returns data that gzip at about 28.5 %
return bytearray(int(random.gauss(0, .8)) % 256 for x in xrange(size))
def DummyData(random=random):
# make sure sample is bigger than dictionary of compressor
data = ''.join(chr(int(random.gauss(0, .8)) % 256) for x in xrange(100000))
return StringIO(data)
return StringIO(_DummyData(random, 100000))
class DummyZODB(object):
......
......@@ -89,7 +89,7 @@ class StorageDBTests(NeoUnitTestBase):
self.db.lockTransaction(tid, ttid)
yield
if commit:
self.db.unlockTransaction(tid, ttid)
self.db.unlockTransaction(tid, ttid, True, objs)
self.db.commit()
elif commit is not None:
self.db.abortTransaction(ttid)
......@@ -227,6 +227,7 @@ class StorageDBTests(NeoUnitTestBase):
def test_changePartitionTable(self):
db = self.getDB()
db.setNumPartitions(3)
ptid = 1
uuid = self.getStorageUUID()
cell1 = 0, uuid, CellStates.OUT_OF_DATE
......@@ -253,7 +254,7 @@ class StorageDBTests(NeoUnitTestBase):
txn1, objs1 = self.getTransaction([oid1])
txn2, objs2 = self.getTransaction([oid2])
# nothing in database
self.assertEqual(self.db.getLastIDs(), (None, {}, {}, None))
self.assertEqual(self.db.getLastIDs(), (None, None))
self.assertEqual(self.db.getUnfinishedTIDDict(), {})
self.assertEqual(self.db.getObject(oid1), None)
self.assertEqual(self.db.getObject(oid2), None)
......@@ -319,13 +320,17 @@ class StorageDBTests(NeoUnitTestBase):
expected = [(t, oid_list[offset+i]) for t in tids for i in (0, np)]
self.assertEqual(self.db.getReplicationObjectList(ZERO_TID,
MAX_TID, len(expected) + 1, offset, ZERO_OID), expected)
self.db._deleteRange(0, MAX_TID)
self.db._deleteRange(0, max_tid=ZERO_TID)
def deleteRange(partition, min_tid=None, max_tid=None):
self.db._deleteRange(partition,
None if min_tid is None else u64(min_tid),
None if max_tid is None else u64(max_tid))
deleteRange(0, MAX_TID)
deleteRange(0, max_tid=ZERO_TID)
check(0, [], t1, t2, t3)
self.db._deleteRange(0); check(0, [])
self.db._deleteRange(1, t2); check(1, [t1], t1, t2)
self.db._deleteRange(2, max_tid=t2); check(2, [], t3)
self.db._deleteRange(3, t1, t2); check(3, [t3], t1, t3)
deleteRange(0); check(0, [])
deleteRange(1, t2); check(1, [t1], t1, t2)
deleteRange(2, max_tid=t2); check(2, [], t3)
deleteRange(3, t1, t2); check(3, [t3], t1, t3)
def test_getTransaction(self):
oid1, oid2 = self.getOIDs(2)
......
......@@ -15,17 +15,32 @@
# along with this program. If not, see <http://www.gnu.org/licenses/>.
import unittest
from MySQLdb import NotSupportedError, OperationalError
from contextlib import contextmanager
from MySQLdb import NotSupportedError, OperationalError, ProgrammingError
from MySQLdb.constants.CR import SERVER_GONE_ERROR
from MySQLdb.constants.ER import UNKNOWN_STORAGE_ENGINE
from ..mock import Mock
from neo.lib.exception import DatabaseFailure
from neo.lib.protocol import ZERO_OID
from neo.lib.util import p64
from .. import DB_PREFIX, DB_SOCKET, DB_USER
from .. import DB_PREFIX, DB_SOCKET, DB_USER, Patch
from .testStorageDBTests import StorageDBTests
from neo.storage.database import DatabaseFailure
from neo.storage.database.mysqldb import MySQLDatabaseManager
class ServerGone(object):
@contextmanager
def __new__(cls, db):
self = object.__new__(cls)
with Patch(db, conn=self) as self._p:
yield self._p
def query(self, *args):
self._p.revert()
raise OperationalError(SERVER_GONE_ERROR, 'this is a test')
class StorageMySQLdbTests(StorageDBTests):
engine = None
......@@ -67,23 +82,9 @@ class StorageMySQLdbTests(StorageDBTests):
calls[0].checkArgs('SELECT ')
def test_query2(self):
# test the OperationalError exception
# fake object, raise exception during the first call
from MySQLdb.constants.CR import SERVER_GONE_ERROR
class FakeConn(object):
def query(*args):
raise OperationalError(SERVER_GONE_ERROR, 'this is a test')
self.db.conn = FakeConn()
self.connect_called = False
def connect_hook():
# mock object, break raise/connect loop
self.db.conn = Mock()
self.connect_called = True
self.db._connect = connect_hook
# make a query, exception will be raised then connect() will be
# called and the second query will use the mock object
self.db.query('INSERT')
self.assertTrue(self.connect_called)
with ServerGone(self.db) as p:
self.assertRaises(ProgrammingError, self.db.query, 'QUERY')
self.assertFalse(p.applied)
def test_query3(self):
# OperationalError > raise DatabaseFailure exception
......
......@@ -21,6 +21,8 @@ from neo.lib.util import ReadBuffer, parseNodeAddress
class UtilTests(NeoUnitTestBase):
from neo.storage.shared_queue import test as testSharedQueue
def test_parseNodeAddress(self):
""" Parsing of addresses """
def test(parsed, *args):
......
......@@ -19,7 +19,6 @@
import os, random, select, socket, sys, tempfile
import thread, threading, time, traceback, weakref
from collections import deque
from ConfigParser import SafeConfigParser
from contextlib import contextmanager
from itertools import count
from functools import partial, wraps
......@@ -37,8 +36,9 @@ from neo.lib.handler import EventHandler
from neo.lib.locking import SimpleQueue
from neo.lib.protocol import ClusterStates, Enum, NodeStates, NodeTypes, Packets
from neo.lib.util import cached_property, parseMasterList, p64
from .. import NeoTestBase, Patch, getTempDirectory, setupMySQLdb, \
ADDRESS_TYPE, IP_VERSION_FORMAT_DICT, DB_PREFIX, DB_SOCKET, DB_USER
from .. import (getTempDirectory, setupMySQLdb,
ImporterConfigParser, NeoTestBase, Patch,
ADDRESS_TYPE, IP_VERSION_FORMAT_DICT, DB_PREFIX, DB_SOCKET, DB_USER)
BIND = IP_VERSION_FORMAT_DICT[ADDRESS_TYPE], 0
LOCAL_IP = socket.inet_pton(ADDRESS_TYPE, IP_VERSION_FORMAT_DICT[ADDRESS_TYPE])
......@@ -185,6 +185,8 @@ class Serialized(object):
# a single-core CPU, other threads are still busy and haven't
# sent anything yet on the network. This causes tic() to
# return prematurely. Passing a non-zero value is a hack.
# We also increase SocketConnector.SOMAXCONN in tests so that
# a connection attempt is never delayed inside the kernel.
timeout=0):
# If you're in a pdb here, 'n' switches to another thread
# (the following lines are not supposed to be debugged into)
......@@ -634,6 +636,7 @@ class NEOCluster(object):
Patch(BaseConnection, getTimeout=lambda orig, self: None),
Patch(SimpleQueue, __init__=__init__),
Patch(SocketConnector, CONNECT_LIMIT=0),
Patch(SocketConnector, SOMAXCONN=128), # see Serialized.tic comment
Patch(SocketConnector, _bind=lambda orig, self, addr: orig(self, BIND)),
Patch(SocketConnector, _connect = lambda orig, self, addr:
orig(self, ServerNode.resolv(addr))))
......@@ -674,8 +677,8 @@ class NEOCluster(object):
adapter=os.getenv('NEO_TESTS_ADAPTER', 'SQLite'),
storage_count=None, db_list=None, clear_databases=True,
db_user=DB_USER, db_password='', compress=True,
importer=None, autostart=None, dedup=False):
self.name = 'neo_%s' % self._allocate('name',
importer=None, autostart=None, dedup=False, name=None):
self.name = name or 'neo_%s' % self._allocate('name',
lambda: random.randint(0, 100))
self.compress = compress
self.num_partitions = partitions
......@@ -707,14 +710,8 @@ class NEOCluster(object):
else:
assert False, adapter
if importer:
cfg = SafeConfigParser()
cfg.add_section("neo")
cfg.set("neo", "adapter", adapter)
cfg = ImporterConfigParser(adapter, **importer)
cfg.set("neo", "database", db % tuple(db_list))
for name, zodb in importer:
cfg.add_section(name)
for x in zodb.iteritems():
cfg.set(name, *x)
db = os.path.join(getTempDirectory(), '%s.conf')
with open(db % tuple(db_list), "w") as f:
cfg.write(f)
......@@ -813,7 +810,7 @@ class NEOCluster(object):
else NodeStates.RUNNING)
for node in self.storage_list if storage_list is None else storage_list:
state = self.getNodeState(node)
assert state == expected_state, (node, state)
assert state == expected_state, (repr(node), state)
def stop(self, clear_database=False, __print_exc=traceback.print_exc, **kw):
if self.started:
......@@ -933,10 +930,9 @@ class NEOCluster(object):
if dummy_zodb is None:
from ..stat_zodb import PROD1
dummy_zodb = PROD1(random)
preindex = {}
as_storage = dummy_zodb.as_storage
return lambda count: self.getZODBStorage().importFrom(
as_storage(count), preindex=preindex)
return lambda count: self.getZODBStorage().copyTransactionsFrom(
as_storage(count))
def populate(self, transaction_list, tid=lambda i: p64(i+1),
oid=lambda i: p64(i+1)):
......@@ -1061,7 +1057,11 @@ class NEOThreadedTest(NeoTestBase):
with Patch(client, _getFinalTID=lambda *_: None):
self.assertRaises(ConnectionClosed, txn.commit)
def assertPartitionTable(self, cluster, expected, pt_node=None):
def assertPartitionTable(self, cluster, expected, pt_node=None,
sort_by_nid=False):
if sort_by_nid:
index = lambda x: x
else:
index = [x.uuid for x in cluster.storage_list].index
super(NEOThreadedTest, self).assertPartitionTable(
(pt_node or cluster.admin).pt, expected,
......
......@@ -23,7 +23,6 @@ import unittest
from collections import defaultdict
from contextlib import contextmanager
from thread import get_ident
from zlib import compress
from persistent import Persistent, GHOST
from transaction.interfaces import TransientError
from ZODB import DB, POSException
......@@ -31,7 +30,7 @@ from ZODB.DB import TransactionalUndo
from neo.storage.transactions import TransactionManager, ConflictError
from neo.lib.connection import ConnectionClosed, \
ServerConnection, MTClientConnection
from neo.lib.exception import DatabaseFailure, StoppedOperation
from neo.lib.exception import StoppedOperation
from neo.lib.handler import DelayEvent, EventHandler
from neo.lib import logging
from neo.lib.protocol import (CellStates, ClusterStates, NodeStates, NodeTypes,
......@@ -43,6 +42,7 @@ from neo.lib.util import add64, makeChecksum, p64, u64
from neo.client.exception import NEOPrimaryMasterLost, NEOStorageError
from neo.client.transactions import Transaction
from neo.master.handlers.client import ClientServiceHandler
from neo.storage.database import DatabaseFailure
from neo.storage.handlers.client import ClientOperationHandler
from neo.storage.handlers.identification import IdentificationHandler
from neo.storage.handlers.initialization import InitializationHandler
......@@ -60,12 +60,13 @@ class PCounterWithResolution(PCounter):
class Test(NEOThreadedTest):
@with_cluster()
def testBasicStore(self, cluster):
if 1:
def testBasicStore(self, dedup=False):
with NEOCluster(dedup=dedup) as cluster:
cluster.start()
storage = cluster.getZODBStorage()
storage.sync()
storage.app.max_reconnection_to_master = 0
compress = storage.app.compress._compress
data_info = {}
compressible = 'x' * 20
compressed = compress(compressible)
......@@ -137,27 +138,6 @@ class Test(NEOThreadedTest):
self.assertRaises(POSException.POSKeyError,
storage.load, oid, '')
@with_cluster()
def testCreationUndoneHistory(self, cluster):
if 1:
storage = cluster.getZODBStorage()
oid = storage.new_oid()
txn = transaction.Transaction()
storage.tpc_begin(txn)
storage.store(oid, None, 'foo', '', txn)
storage.tpc_vote(txn)
tid1 = storage.tpc_finish(txn)
storage.tpc_begin(txn)
storage.undo(tid1, txn)
tid2 = storage.tpc_finish(txn)
storage.tpc_begin(txn)
storage.undo(tid2, txn)
tid3 = storage.tpc_finish(txn)
expected = [(tid1, 3), (tid2, 0), (tid3, 3)]
for x in storage.history(oid, 10):
self.assertEqual((x['tid'], x['size']), expected.pop())
self.assertFalse(expected)
def _testUndoConflict(self, cluster, *inc):
def waitResponses(orig, *args):
orig(*args)
......@@ -738,8 +718,9 @@ class Test(NEOThreadedTest):
@with_cluster()
def testStorageUpgrade1(self, cluster):
if 1:
storage = cluster.storage
# Disable migration steps that aren't idempotent.
with Patch(storage.dm.__class__, _migrate3=lambda *_: None):
t, c = cluster.getTransaction()
storage.dm.setConfiguration("version", None)
c.root()._p_changed = 1
......@@ -1309,7 +1290,7 @@ class Test(NEOThreadedTest):
s1.resetNode()
with Patch(s1.dm, truncate=dieFirst(1)):
s1.start()
self.assertEqual(s0.dm.getLastIDs()[0], truncate_tid)
self.assertFalse(s0.dm.getLastIDs()[0])
self.assertEqual(s1.dm.getLastIDs()[0], r._p_serial)
self.tic()
self.assertEqual(calls, [1, 2])
......@@ -1723,7 +1704,18 @@ class Test(NEOThreadedTest):
x.value += 1
c2.root()['x'].value += 2
TransactionalResource(t1, 1, tpc_begin=begin1)
s1m, = s1.getConnectionList(cluster.master)
# BUG: Very rarely, getConnectionList returns more that 1
# connection ("too many values to unpack"), which is
# a mystery and impossible to reproduce:
# - 1st time: v1.8.1 on a test machine (no SSL)
# - last: current revision on my laptop (SSL),
# at the first iteration of this loop
_sm = list(s1.getConnectionList(cluster.master))
try:
s1m, = _sm
except ValueError:
self.fail((_sm, list(
s1.getConnectionList(cluster.master))))
try:
s1.em.removeReader(s1m)
with ConnectionFilter() as f, \
......@@ -2371,7 +2363,7 @@ class Test(NEOThreadedTest):
oid, tid = big_id_list[i]
for j, expected in (
(1 - i, (dm.getLastTID(u64(MAX_TID)), dm.getLastIDs())),
(i, (u64(tid), (tid, {}, {}, oid)))):
(i, (u64(tid), (tid, oid)))):
oid, tid = big_id_list[j]
# Somehow we abuse 'storeTransaction' because we ask it to
# write data for unassigned partitions. This is not checked
......@@ -2381,6 +2373,44 @@ class Test(NEOThreadedTest):
self.assertEqual(expected,
(dm.getLastTID(u64(MAX_TID)), dm.getLastIDs()))
def testStorageUpgrade(self):
path = os.path.join(os.path.dirname(__file__),
self._testMethodName + '-%s',
's%s.sql')
dump_dict = {}
def switch(s):
dm = s.dm
dm.commit()
dump_dict[s.uuid] = dm.dump()
dm.erase()
with open(path % (s.getAdapter(), s.uuid)) as f:
dm.restore(f.read())
with NEOCluster(storage_count=3, partitions=3, replicas=1,
name=self._testMethodName) as cluster:
s1, s2, s3 = cluster.storage_list
cluster.start(storage_list=(s1,))
for s in s2, s3:
s.start()
self.tic()
cluster.neoctl.enableStorageList([s.uuid])
cluster.neoctl.tweakPartitionTable()
self.tic()
nid_list = [s.uuid for s in cluster.storage_list]
switch(s3)
s3.stop()
storage = cluster.getZODBStorage()
txn = transaction.Transaction()
storage.tpc_begin(txn, p64(85**9)) # partition 1
storage.store(p64(0), None, 'foo', '', txn)
storage.tpc_vote(txn)
storage.tpc_finish(txn)
self.tic()
switch(s1)
switch(s2)
cluster.stop()
for i, s in zip(nid_list, cluster.storage_list):
self.assertMultiLineEqual(s.dm.dump(), dump_dict[i])
if __name__ == "__main__":
unittest.main()
#
# Copyright (C) 2018 Nexedi SA
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
import unittest
from contextlib import contextmanager
from ZConfig import ConfigurationSyntaxError
from ZODB.config import databaseFromString
from .. import Patch
from . import ClientApplication, NEOThreadedTest, with_cluster
from neo.client import Storage
def databaseFromDict(**kw):
return databaseFromString("%%import neo.client\n"
"<zodb>\n <NEOStorage>\n%s </NEOStorage>\n</zodb>\n"
% ''.join(' %s %s\n' % x for x in kw.iteritems()))
class ConfigTests(NEOThreadedTest):
dummy_required = {'name': 'cluster', 'master_nodes': '127.0.0.1:10000'}
@contextmanager
def _db(self, cluster, **kw):
kw['name'] = cluster.name
kw['master_nodes'] = cluster.master_nodes
def newClient(_, *args, **kw):
client = ClientApplication(*args, **kw)
t.append(client.poll_thread)
return client
t = []
with Patch(Storage, Application=newClient):
db = databaseFromDict(**kw)
try:
yield db
finally:
db.close()
cluster.join(t)
@with_cluster()
def testCompress(self, cluster):
kw = self.dummy_required.copy()
valid = ['false', 'true', 'zlib', 'zlib=9']
for kw['compress'] in '9', 'best', 'zlib=0', 'zlib=100':
self.assertRaises(ConfigurationSyntaxError, databaseFromDict, **kw)
for compress in valid:
with self._db(cluster, compress=compress) as db:
self.assertEqual((0,0,''), db.storage.app.compress(''))
if __name__ == "__main__":
unittest.main()
......@@ -16,20 +16,19 @@
from cPickle import Pickler, Unpickler
from cStringIO import StringIO
from itertools import islice, izip_longest
import os, shutil, unittest
import neo, transaction, ZODB
from itertools import izip_longest
import os, random, shutil, time, unittest
import transaction, ZODB
from neo.client.exception import NEOPrimaryMasterLost
from neo.lib import logging
from neo.lib.util import u64
from neo.storage.database.importer import Repickler
from ..fs2zodb import Inode
from .. import expectedFailure, getTempDirectory
from neo.storage.database import getAdapterKlass, importer, manager
from neo.storage.database.importer import Repickler, TransactionRecord
from .. import expectedFailure, getTempDirectory, random_tree, Patch
from . import NEOCluster, NEOThreadedTest
from ZODB import serialize
from ZODB.FileStorage import FileStorage
class Equal:
_recurse = {}
......@@ -129,61 +128,58 @@ class ImporterTests(NEOThreadedTest):
self.assertIs(Obj, load())
self.assertDictEqual(state, load())
def test(self):
# XXX: Using NEO source files as test data was a bad idea because
# the test breaks easily in case of massive changes in the code,
# or if there are many untracked files.
importer = []
def _importFromFileStorage(self, multi=(),
root_filter=None, sub_filter=None):
import_hash = '1d4ff03730fe6bcbf235e3739fbe5f5b'
txn_size = 10
tree = random_tree.generateTree(random.Random(0))
i = len(tree) // 3
assert i > txn_size
before_tree = tree[:i]
after_tree = tree[i:]
fs_dir = os.path.join(getTempDirectory(), self.id())
shutil.rmtree(fs_dir, 1) # for --loop
os.mkdir(fs_dir)
src_root, = neo.__path__
fs_list = "root", "client", "master", "tests"
def not_pyc(name):
return not name.endswith(".pyc")
# We use 'hash' to skip roughly half of files.
# They'll be added after the migration has started.
def root_filter(name):
if not_pyc(name):
i = name.find(os.sep)
return (i < 0 or name[:i] not in fs_list) and (
'.' not in name or hash(name) & 1)
def sub_filter(name):
return lambda n: not_pyc(n) and (
hash(n) & 1 if '.' in n else
os.sep in n or n in (name, "scripts"))
conn_list = []
iter_list = []
db_list = []
# Setup several FileStorage databases.
for i, name in enumerate(fs_list):
fs_path = os.path.join(fs_dir, name + ".fs")
for i, db in enumerate(('root',) + multi):
fs_path = os.path.join(fs_dir, '%s.fs' % db)
c = ZODB.DB(FileStorage(fs_path)).open()
r = c.root()["neo"] = Inode()
r = c.root()['tree'] = random_tree.Node()
transaction.commit()
conn_list.append(c)
iter_list.append(r.treeFromFs(src_root, 10,
sub_filter(name) if i else root_filter))
importer.append((name, {
iter_list.append(random_tree.importTree(r, before_tree, txn_size,
sub_filter(db) if i else root_filter))
db_list.append((db, r, {
"storage": "<filestorage>\npath %s\n</filestorage>" % fs_path
}))
# Populate FileStorage databases.
for iter_list in izip_longest(*iter_list):
for i in iter_list:
if i:
for i, iter_list in enumerate(izip_longest(*iter_list)):
for r in iter_list:
if r:
transaction.commit()
del iter_list
# Get oids of mount points and close.
for (name, cfg), c in zip(importer, conn_list):
r = c.root()["neo"]
if name == "root":
for name in fs_list[1:]:
cfg[name] = str(u64(r[name]._p_oid))
zodb = []
importer = {'zodb': zodb}
for db, r, cfg in db_list:
if db == 'root':
if multi:
for x in multi:
cfg['_%s' % x] = str(u64(r[x]._p_oid))
else:
h = random_tree.hashTree(r)
h()
self.assertEqual(import_hash, h.hexdigest())
importer['writeback'] = 'true'
else:
cfg["oid"] = str(u64(r[name]._p_oid))
c.db().close()
#del importer[0][1][importer.pop()[0]]
# Start NEO cluster with transparent import of a multi-base ZODB.
with NEOCluster(compress=False, importer=importer) as cluster:
cfg["oid"] = str(u64(r[db]._p_oid))
db = '_%s' % db
r._p_jar.db().close()
zodb.append((db, cfg))
del db_list, iter_list
#del zodb[0][1][zodb.pop()[0]]
# Start NEO cluster with transparent import.
with NEOCluster(importer=importer) as cluster:
# Suspend import for a while, so that import
# is finished in the middle of the below 'for' loop.
# Use a slightly different main loop for storage so that it
......@@ -202,7 +198,7 @@ class ImporterTests(NEOThreadedTest):
dm.doOperation = doOperation
cluster.start()
t, c = cluster.getTransaction()
r = c.root()["neo"]
r = c.root()['tree']
# Test retrieving of an object from ZODB when next serial is in NEO.
r._p_changed = 1
t.commit()
......@@ -213,31 +209,81 @@ class ImporterTests(NEOThreadedTest):
##
self.assertRaisesRegexp(NotImplementedError, " getObjectHistory$",
c.db().history, r._p_oid)
i = r.walk()
next(islice(i, 4, None))
h = random_tree.hashTree(r)
h(30)
logging.info("start migration")
dm.doOperation(cluster.storage)
# Adjust if needed. Must remain > 0.
assert 14 == sum(1 for i in i)
self.assertEqual(22, h())
self.assertEqual(import_hash, h.hexdigest())
# New writes after the switch to NEO.
last_import = -1
for i, r in enumerate(r.treeFromFs(src_root, 6, not_pyc)):
for i, r in enumerate(random_tree.importTree(
r, after_tree, txn_size)):
t.commit()
if cluster.storage.dm._import:
last_import = i
self.tic()
# Same as above. We want last_import smaller enough compared to i
assert i / 3 < last_import < i - 2, (last_import, i)
assert i < last_import * 3 < 2 * i, (last_import, i)
self.assertFalse(cluster.storage.dm._import)
i = len(src_root) + 1
self.assertEqual(sorted(r.walk()), sorted(
(x[i:] or '.', sorted(y), sorted(filter(not_pyc, z)))
for x, y, z in os.walk(src_root)))
t.commit()
storage._cache.clear()
def finalCheck(r):
h = random_tree.hashTree(r)
self.assertEqual(93, h())
self.assertEqual('6bf0f0cb2d6c1aae9e52c412ef0e25b6',
h.hexdigest())
finalCheck(r)
if dm._writeback:
dm.commit()
dm._writeback.wait()
if dm._writeback:
db = ZODB.DB(FileStorage(fs_path, read_only=True))
finalCheck(db.open().root()['tree'])
db.close()
@unittest.skipUnless(importer.FORK, 'no os.fork')
def test1(self):
self._importFromFileStorage()
def testThreadedWriteback(self):
# Also check reconnection to the underlying DB for relevant backends.
tid_list = []
def __init__(orig, tr, db, tid):
orig(tr, db, tid)
tid_list.append(tid)
def fetchObject(orig, db, *args):
if len(tid_list) == 5:
if isinstance(db, getAdapterKlass('MySQL')):
from neo.tests.storage.testStorageMySQL import ServerGone
with ServerGone(db):
orig(db, *args)
self.fail()
else:
tid_list.append(None)
p.revert()
return orig(db, *args)
def sleep(orig, seconds):
self.assertEqual(len(tid_list), 5)
p.revert()
with Patch(importer, FORK=False), \
Patch(TransactionRecord, __init__=__init__), \
Patch(manager.DatabaseManager, fetchObject=fetchObject), \
Patch(time, sleep=sleep) as p:
self._importFromFileStorage()
self.assertFalse(p.applied)
self.assertEqual(len(tid_list), 11)
def testMerge(self):
multi = 1, 2, 3
self._importFromFileStorage(multi,
(lambda path: path[0] not in multi or len(path) == 1),
(lambda db: lambda path: path[0] in (db, 4)))
if getattr(serialize, '_protocol', 1) > 1:
# XXX: With ZODB5, we should at least keep a working test that does not
# merge several DB.
test = expectedFailure(NEOPrimaryMasterLost)(test)
testMerge = expectedFailure(NEOPrimaryMasterLost)(testMerge)
if __name__ == "__main__":
unittest.main()
......@@ -20,16 +20,18 @@ from ZODB.POSException import ReadOnlyError, POSKeyError
import unittest
from collections import defaultdict
from functools import wraps
from itertools import product
from neo.lib import logging
from neo.client.exception import NEOStorageError
from neo.master.handlers.backup import BackupHandler
from neo.storage.checker import CHECK_COUNT
from neo.storage.replicator import Replicator
from neo.storage.database.manager import DatabaseManager
from neo.storage import replicator
from neo.lib.connector import SocketConnector
from neo.lib.connection import ClientConnection
from neo.lib.protocol import CellStates, ClusterStates, Packets, \
ZERO_OID, ZERO_TID, MAX_TID, uuid_str
from neo.lib.util import p64, u64
from neo.lib.util import add64, p64, u64
from .. import expectedFailure, Patch, TransactionalResource
from . import ConnectionFilter, NEOCluster, NEOThreadedTest, \
predictable_random, with_cluster
......@@ -45,9 +47,9 @@ if 0:
def backup_test(partitions=1, upstream_kw={}, backup_kw={}):
def decorator(wrapped):
def wrapper(self):
with NEOCluster(partitions, **upstream_kw) as upstream:
with NEOCluster(partitions=partitions, **upstream_kw) as upstream:
upstream.start()
with NEOCluster(partitions, upstream=upstream,
with NEOCluster(partitions=partitions, upstream=upstream,
**backup_kw) as backup:
backup.start()
backup.neoctl.setClusterState(ClusterStates.STARTING_BACKUP)
......@@ -295,6 +297,7 @@ class ReplicationTests(NEOThreadedTest):
storage_list = [x.uuid for x in backup.storage_list]
slave = set(xrange(len(storage_list))).difference
for event in xrange(10):
logging.info("event=%s", event)
counts = [0]
if event == 5:
p = Patch(upstream.master.tm,
......@@ -394,6 +397,35 @@ class ReplicationTests(NEOThreadedTest):
self.tic()
self.assertTrue(backup.master.is_alive())
@backup_test()
def testCreationUndone(self, backup):
"""
Check both IStorage.history and replication when the DB contains a
deletion record.
XXX: This test reveals that without --dedup, the replication does not
preserve the deduplication that is done by the 'undo' code.
"""
storage = backup.upstream.getZODBStorage()
oid = storage.new_oid()
txn = transaction.Transaction()
storage.tpc_begin(txn)
storage.store(oid, None, 'foo', '', txn)
storage.tpc_vote(txn)
tid1 = storage.tpc_finish(txn)
storage.tpc_begin(txn)
storage.undo(tid1, txn)
tid2 = storage.tpc_finish(txn)
storage.tpc_begin(txn)
storage.undo(tid2, txn)
tid3 = storage.tpc_finish(txn)
expected = [(tid1, 3), (tid2, 0), (tid3, 3)]
for x in storage.history(oid, 10):
self.assertEqual((x['tid'], x['size']), expected.pop())
self.assertFalse(expected)
self.tic()
self.assertEqual(1, self.checkBackup(backup))
@backup_test()
def testBackupTid(self, backup):
"""
......@@ -426,19 +458,24 @@ class ReplicationTests(NEOThreadedTest):
orig(*args)
sys.exit()
s0, s1, s2 = cluster.storage_list
if 1:
cluster.start([s0, s1])
s2.start()
self.tic()
cluster.enableStorageList([s2])
# 2 UP_TO_DATE cells become FEEDING:
# they are dropped only when the replication is done,
# so that 1 storage can still die without data loss.
# they are "normally" (see below) dropped only when the replication
# is done, so that 1 storage can still die without data loss.
with Patch(s0.dm, changePartitionTable=changePartitionTable):
cluster.neoctl.tweakPartitionTable()
self.tic()
self.assertEqual(cluster.neoctl.getClusterState(),
ClusterStates.RUNNING)
# 1 of the FEEDING cells was actually discarded immediately when it got
# out-of-date, so that we don't end up with too many up-to-date cells.
s0.resetNode()
s0.start()
self.tic()
self.assertPartitionTable(cluster, 'UU.|U.U|.UU', sort_by_nid=True)
@with_cluster(start_cluster=0, partitions=3, replicas=1, storage_count=3)
def testReplicationAbortedBySource(self, cluster):
......@@ -540,6 +577,29 @@ class ReplicationTests(NEOThreadedTest):
self.assertTrue(s.is_alive())
self.checkReplicas(cluster)
def testTopology(self):
"""
In addition to MasterPartitionTableTests.test_19_topology, this checks
correct propagation of the paths from storage nodes to tweak().
"""
with Patch(DatabaseManager, getTopologyPath=lambda *_: next(topology)):
for topology, expected in (
(iter("0" * 9),
'UU.......|..UU.....|....UU...|'
'......UU.|U.......U|.UU......|'
'...UU....|.....UU..|.......UU'),
(product("012", "012"),
'U..U.....|.U....U..|..U.U....|'
'.....U.U.|U.......U|.U.U.....|'
'..U...U..|....U..U.|.....U..U'),
):
with NEOCluster(replicas=1, partitions=9,
storage_count=9) as cluster:
for i, s in enumerate(cluster.storage_list, 1):
s.uuid = i
cluster.start()
self.assertPartitionTable(cluster, expected)
@with_cluster(start_cluster=0, replicas=1, storage_count=4, partitions=2)
def testTweakVsReplication(self, cluster, done=False):
S = cluster.storage_list
......@@ -675,33 +735,200 @@ class ReplicationTests(NEOThreadedTest):
self.assertEqual(2, s0.sqlCount('obj'))
expectedFailure(self.assertEqual)(2, count)
@with_cluster(start_cluster=0, replicas=1)
@with_cluster(replicas=1)
def testResumingReplication(self, cluster):
if 1:
"""
Check from where replication resumes for an OUT_OF_DATE cell that has
a hole, which is possible because OUT_OF_DATE cells are writable.
"""
ask = []
def logReplication(conn, packet):
if isinstance(packet, (Packets.AskFetchTransactions,
Packets.AskFetchObjects)):
ask.append(packet.decode()[2:])
def getTIDList():
return [t.tid for t in c.db().storage.iterator()]
s0, s1 = cluster.storage_list
cluster.start(storage_list=(s0,))
t, c = cluster.getTransaction()
r = c.root()
# s1 is UP_TO_DATE and it has the initial transaction.
# Let's outdate it: replication will have to resume just after this
# transaction, regardless of future written transactions.
# To make sure, we get a hole in the cell, we block replication.
s1.stop()
cluster.join((s1,))
r._p_changed = 1
t.commit()
s1.resetNode()
with Patch(replicator.Replicator, connected=lambda *_: None):
s1.start()
self.tic()
with Patch(Replicator, connected=lambda *_: None):
cluster.enableStorageList((s1,))
cluster.neoctl.tweakPartitionTable()
r._p_changed = 1
t.commit()
self.tic()
s1.stop()
cluster.join((s1,))
t0, t1, t2 = c.db().storage.iterator()
tids = getTIDList()
s1.resetNode()
# Initialization done. Now we check that replication is correct
# and efficient.
with ConnectionFilter() as f:
f.add(logReplication)
s1.start()
self.tic()
self.assertEqual([], cluster.getOutdatedCells())
s0.stop()
cluster.join((s0,))
t0, t1, t2 = c.db().storage.iterator()
self.assertEqual(tids, getTIDList())
t0_next = add64(tids[0], 1)
self.assertEqual(ask, [
(t0_next, tids[2], tids[2:]),
(t0_next, tids[2], ZERO_OID, {tids[2]: [ZERO_OID]}),
])
@backup_test(2, backup_kw=dict(replicas=1))
def testResumingBackupReplication(self, backup):
upstream = backup.upstream
t, c = upstream.getTransaction()
r = c.root()
r[1] = PCounter()
t.commit()
r[2] = ob = PCounter()
tids = []
def newTransaction():
r._p_changed = ob._p_changed = 1
with upstream.moduloTID(0):
t.commit()
self.tic()
tids.append(r._p_serial)
def getTIDList(storage):
return storage.dm.getReplicationTIDList(tids[0], MAX_TID, 9, 0)
newTransaction()
self.assertEqual(u64(ob._p_oid), 2)
getBackupTid = backup.master.pt.getBackupTid
# Check when an OUT_OF_DATE cell has more data than an UP_TO_DATE one.
primary = backup.master.backup_app.primary_partition_dict[0]._uuid
slave, primary = sorted(backup.storage_list,
key=lambda x: x.uuid == primary)
with ConnectionFilter() as f:
@f.delayAnswerFetchTransactions
def delay(conn, x={None: 0, primary.uuid: 0}):
return x.pop(conn.getUUID(), 1)
newTransaction()
self.assertEqual(getBackupTid(), tids[1])
primary.stop()
backup.join((primary,))
primary.resetNode()
primary.start()
self.tic()
primary, slave = slave, primary
self.assertEqual(tids, getTIDList(slave))
self.assertEqual(tids[:1], getTIDList(primary))
self.assertEqual(getBackupTid(), add64(tids[1], -1))
self.assertEqual(f.filtered_count, 3)
self.tic()
self.assertEqual(4, self.checkBackup(backup))
self.assertEqual(getBackupTid(min), tids[1])
# Check that replication resumes from the maximum possible tid
# (for UP_TO_DATE cells of a backup cluster). More precisely:
# - cells are handled independently (done here by blocking replication
# of partition 1 to keep the backup TID low)
# - trans and obj are also handled independently (with FETCH_COUNT=1,
# we interrupt replication of obj in the middle of a transaction)
slave.stop()
backup.join((slave,))
ask = []
def delayReplicate(conn, packet):
if isinstance(packet, Packets.AskFetchObjects):
if len(ask) == 6:
return True
elif not isinstance(packet, Packets.AskFetchTransactions):
return
ask.append(packet.decode())
conn, = upstream.master.getConnectionList(backup.master)
with ConnectionFilter() as f, Patch(replicator.Replicator,
_nextPartitionSortKey=lambda orig, self, offset: offset):
f.add(delayReplicate)
delayReconnect = f.delayAskLastTransaction()
conn.close()
newTransaction()
newTransaction()
newTransaction()
self.assertFalse(ask)
self.assertEqual(f.filtered_count, 1)
with Patch(replicator, FETCH_COUNT=1):
f.remove(delayReconnect)
self.tic()
t1_next = add64(tids[1], 1)
self.assertEqual(ask, [
# trans
(0, 1, t1_next, tids[4], []),
(0, 1, tids[3], tids[4], []),
(0, 1, tids[4], tids[4], []),
# obj
(0, 1, t1_next, tids[4], ZERO_OID, {}),
(0, 1, tids[2], tids[4], p64(2), {}),
(0, 1, tids[3], tids[4], ZERO_OID, {}),
])
del ask[:]
max_ask = None
backup.stop()
newTransaction()
backup.start((primary,))
n = replicator.FETCH_COUNT
t4_next = add64(tids[4], 1)
self.assertEqual(ask, [
(0, n, t4_next, tids[5], []),
(0, n, tids[3], tids[5], ZERO_OID, {tids[3]: [ZERO_OID]}),
(1, n, t1_next, tids[5], []),
(1, n, t1_next, tids[5], ZERO_OID, {}),
])
self.tic()
self.assertEqual(2, self.checkBackup(backup))
@with_cluster(start_cluster=0, replicas=1)
def testStoppingDuringReplication(self, cluster):
"""
When a node is stopped while it is replicating obj from ZERO_TID,
check that replication does not resume from the beginning.
"""
s1, s2 = cluster.storage_list
cluster.start(storage_list=(s1,))
t, c = cluster.getTransaction()
r = c.root()
r._p_changed = 1
t.commit()
ltid = r._p_serial
trans = []
obj = []
with ConnectionFilter() as f, Patch(replicator, FETCH_COUNT=1):
@f.add
def delayReplicate(conn, packet):
if isinstance(packet, Packets.AskFetchTransactions):
trans.append(packet.decode()[2])
elif isinstance(packet, Packets.AskFetchObjects):
if obj:
return True
obj.append(packet.decode()[2])
s2.start()
self.tic()
cluster.neoctl.enableStorageList([s2.uuid])
cluster.neoctl.tweakPartitionTable()
self.tic()
self.assertEqual(trans, [ZERO_TID, ltid])
self.assertEqual(obj, [ZERO_TID])
self.assertPartitionTable(cluster, 'UO')
s2.stop()
cluster.join((s2,))
s2.resetNode()
del trans[:], obj[:]
s2.start()
self.tic()
self.assertEqual(trans, [ltid])
self.assertEqual(obj, [ltid])
self.assertPartitionTable(cluster, 'UU')
@with_cluster(start_cluster=0, replicas=1, partitions=2)
def testReplicationBlockedByUnfinished1(self, cluster,
......
......@@ -37,6 +37,11 @@ class SSLTests(SSLMixin, test.Test):
testStorageDataLock2 = None
testUndoConflictDuringStore = None
# With MySQL, this test is expensive.
# Let's check deduplication of big oids here.
def testBasicStore(self):
super(SSLTests, self).testBasicStore(True)
def testAbortConnection(self, after_handshake=1):
with self.getLoopbackConnection() as conn:
conn.ask(Packets.Ping())
......
CREATE TABLE `bigdata` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`value` mediumblob NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `config` (
`name` varbinary(255) NOT NULL,
`value` varbinary(255) DEFAULT NULL,
PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `config` VALUES ('name','testStorageUpgrade'),('nid','1'),('partitions','3'),('ptid','9'),('replicas','1');
CREATE TABLE `data` (
`id` bigint(20) unsigned NOT NULL,
`hash` binary(20) NOT NULL,
`compression` tinyint(3) unsigned DEFAULT NULL,
`value` mediumblob NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `data` VALUES (0,0x0BEEC7B5EA3F0FDBC95D0DD47F3C5BC275DA8A33,0,0x666F6F);
CREATE TABLE `obj` (
`partition` smallint(5) unsigned NOT NULL,
`oid` bigint(20) unsigned NOT NULL,
`tid` bigint(20) unsigned NOT NULL,
`data_id` bigint(20) unsigned DEFAULT NULL,
`value_tid` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`partition`,`tid`,`oid`),
KEY `partition` (`partition`,`oid`,`tid`),
KEY `data_id` (`data_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `obj` VALUES (0,0,231616946283203125,0,NULL);
CREATE TABLE `pt` (
`rid` int(10) unsigned NOT NULL,
`nid` int(11) NOT NULL,
`state` tinyint(3) unsigned NOT NULL,
PRIMARY KEY (`rid`,`nid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `pt` VALUES (0,1,0),(0,2,0),(1,1,0),(1,3,1),(2,2,0),(2,3,1);
CREATE TABLE `tobj` (
`partition` smallint(5) unsigned NOT NULL,
`oid` bigint(20) unsigned NOT NULL,
`tid` bigint(20) unsigned NOT NULL,
`data_id` bigint(20) unsigned DEFAULT NULL,
`value_tid` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`tid`,`oid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `trans` (
`partition` smallint(5) unsigned NOT NULL,
`tid` bigint(20) unsigned NOT NULL,
`packed` tinyint(1) NOT NULL,
`oids` mediumblob NOT NULL,
`user` blob NOT NULL,
`description` blob NOT NULL,
`ext` blob NOT NULL,
`ttid` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`partition`,`tid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `trans` VALUES (1,231616946283203125,0,'\0\0\0\0\0\0\0\0','','','',231616946283203125);
CREATE TABLE `ttrans` (
`partition` smallint(5) unsigned NOT NULL,
`tid` bigint(20) unsigned NOT NULL,
`packed` tinyint(1) NOT NULL,
`oids` mediumblob NOT NULL,
`user` blob NOT NULL,
`description` blob NOT NULL,
`ext` blob NOT NULL,
`ttid` bigint(20) unsigned NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `bigdata` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`value` mediumblob NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `config` (
`name` varbinary(255) NOT NULL,
`value` varbinary(255) DEFAULT NULL,
PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `config` VALUES ('name','testStorageUpgrade'),('nid','2'),('partitions','3'),('ptid','9'),('replicas','1');
CREATE TABLE `data` (
`id` bigint(20) unsigned NOT NULL,
`hash` binary(20) NOT NULL,
`compression` tinyint(3) unsigned DEFAULT NULL,
`value` mediumblob NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `data` VALUES (0,0x0BEEC7B5EA3F0FDBC95D0DD47F3C5BC275DA8A33,0,0x666F6F);
CREATE TABLE `obj` (
`partition` smallint(5) unsigned NOT NULL,
`oid` bigint(20) unsigned NOT NULL,
`tid` bigint(20) unsigned NOT NULL,
`data_id` bigint(20) unsigned DEFAULT NULL,
`value_tid` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`partition`,`tid`,`oid`),
KEY `partition` (`partition`,`oid`,`tid`),
KEY `data_id` (`data_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `obj` VALUES (0,0,231616946283203125,0,NULL);
CREATE TABLE `pt` (
`rid` int(10) unsigned NOT NULL,
`nid` int(11) NOT NULL,
`state` tinyint(3) unsigned NOT NULL,
PRIMARY KEY (`rid`,`nid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `pt` VALUES (0,1,0),(0,2,0),(1,1,0),(1,3,1),(2,2,0),(2,3,1);
CREATE TABLE `tobj` (
`partition` smallint(5) unsigned NOT NULL,
`oid` bigint(20) unsigned NOT NULL,
`tid` bigint(20) unsigned NOT NULL,
`data_id` bigint(20) unsigned DEFAULT NULL,
`value_tid` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`tid`,`oid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `trans` (
`partition` smallint(5) unsigned NOT NULL,
`tid` bigint(20) unsigned NOT NULL,
`packed` tinyint(1) NOT NULL,
`oids` mediumblob NOT NULL,
`user` blob NOT NULL,
`description` blob NOT NULL,
`ext` blob NOT NULL,
`ttid` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`partition`,`tid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `ttrans` (
`partition` smallint(5) unsigned NOT NULL,
`tid` bigint(20) unsigned NOT NULL,
`packed` tinyint(1) NOT NULL,
`oids` mediumblob NOT NULL,
`user` blob NOT NULL,
`description` blob NOT NULL,
`ext` blob NOT NULL,
`ttid` bigint(20) unsigned NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `bigdata` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`value` mediumblob NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `config` (
`name` varbinary(255) NOT NULL,
`value` varbinary(255) DEFAULT NULL,
PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `config` VALUES ('name','testStorageUpgrade'),('nid','3'),('partitions','3'),('ptid','8'),('replicas','1');
CREATE TABLE `data` (
`id` bigint(20) unsigned NOT NULL,
`hash` binary(20) NOT NULL,
`compression` tinyint(3) unsigned DEFAULT NULL,
`value` mediumblob NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `obj` (
`partition` smallint(5) unsigned NOT NULL,
`oid` bigint(20) unsigned NOT NULL,
`tid` bigint(20) unsigned NOT NULL,
`data_id` bigint(20) unsigned DEFAULT NULL,
`value_tid` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`partition`,`tid`,`oid`),
KEY `partition` (`partition`,`oid`,`tid`),
KEY `data_id` (`data_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `pt` (
`rid` int(10) unsigned NOT NULL,
`nid` int(11) NOT NULL,
`state` tinyint(3) unsigned NOT NULL,
PRIMARY KEY (`rid`,`nid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `pt` VALUES (0,1,0),(0,2,0),(1,1,0),(1,3,0),(2,2,0),(2,3,0);
CREATE TABLE `tobj` (
`partition` smallint(5) unsigned NOT NULL,
`oid` bigint(20) unsigned NOT NULL,
`tid` bigint(20) unsigned NOT NULL,
`data_id` bigint(20) unsigned DEFAULT NULL,
`value_tid` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`tid`,`oid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `trans` (
`partition` smallint(5) unsigned NOT NULL,
`tid` bigint(20) unsigned NOT NULL,
`packed` tinyint(1) NOT NULL,
`oids` mediumblob NOT NULL,
`user` blob NOT NULL,
`description` blob NOT NULL,
`ext` blob NOT NULL,
`ttid` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`partition`,`tid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `ttrans` (
`partition` smallint(5) unsigned NOT NULL,
`tid` bigint(20) unsigned NOT NULL,
`packed` tinyint(1) NOT NULL,
`oids` mediumblob NOT NULL,
`user` blob NOT NULL,
`description` blob NOT NULL,
`ext` blob NOT NULL,
`ttid` bigint(20) unsigned NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
BEGIN TRANSACTION;
CREATE TABLE config (
name TEXT NOT NULL PRIMARY KEY,
value TEXT);
INSERT INTO "config" VALUES('name','testStorageUpgrade');
INSERT INTO "config" VALUES('nid','1');
INSERT INTO "config" VALUES('partitions','3');
INSERT INTO "config" VALUES('replicas','1');
INSERT INTO "config" VALUES('ptid','9');
CREATE TABLE data (
id INTEGER PRIMARY KEY,
hash BLOB NOT NULL,
compression INTEGER NOT NULL,
value BLOB NOT NULL);
INSERT INTO "data" VALUES(0,X'0BEEC7B5EA3F0FDBC95D0DD47F3C5BC275DA8A33',0,X'666F6F');
CREATE TABLE obj (
partition INTEGER NOT NULL,
oid INTEGER NOT NULL,
tid INTEGER NOT NULL,
data_id INTEGER,
value_tid INTEGER,
PRIMARY KEY (partition, tid, oid));
INSERT INTO "obj" VALUES(0,0,231616946283203125,0,NULL);
CREATE TABLE pt (
rid INTEGER NOT NULL,
nid INTEGER NOT NULL,
state INTEGER NOT NULL,
PRIMARY KEY (rid, nid));
INSERT INTO "pt" VALUES(0,1,0);
INSERT INTO "pt" VALUES(1,1,0);
INSERT INTO "pt" VALUES(0,2,0);
INSERT INTO "pt" VALUES(2,2,0);
INSERT INTO "pt" VALUES(1,3,1);
INSERT INTO "pt" VALUES(2,3,1);
CREATE TABLE tobj (
partition INTEGER NOT NULL,
oid INTEGER NOT NULL,
tid INTEGER NOT NULL,
data_id INTEGER,
value_tid INTEGER,
PRIMARY KEY (tid, oid));
CREATE TABLE trans (
partition INTEGER NOT NULL,
tid INTEGER NOT NULL,
packed BOOLEAN NOT NULL,
oids BLOB NOT NULL,
user BLOB NOT NULL,
description BLOB NOT NULL,
ext BLOB NOT NULL,
ttid INTEGER NOT NULL,
PRIMARY KEY (partition, tid));
INSERT INTO "trans" VALUES(1,231616946283203125,0,X'0000000000000000',X'',X'',X'',231616946283203125);
CREATE TABLE ttrans (
partition INTEGER NOT NULL,
tid INTEGER NOT NULL,
packed BOOLEAN NOT NULL,
oids BLOB NOT NULL,
user BLOB NOT NULL,
description BLOB NOT NULL,
ext BLOB NOT NULL,
ttid INTEGER NOT NULL);
CREATE INDEX _obj_i1 ON
obj(partition, oid, tid)
;
CREATE INDEX _obj_i2 ON
obj(data_id)
;
COMMIT;
BEGIN TRANSACTION;
CREATE TABLE config (
name TEXT NOT NULL PRIMARY KEY,
value TEXT);
INSERT INTO "config" VALUES('name','testStorageUpgrade');
INSERT INTO "config" VALUES('nid','2');
INSERT INTO "config" VALUES('partitions','3');
INSERT INTO "config" VALUES('replicas','1');
INSERT INTO "config" VALUES('ptid','9');
CREATE TABLE data (
id INTEGER PRIMARY KEY,
hash BLOB NOT NULL,
compression INTEGER NOT NULL,
value BLOB NOT NULL);
INSERT INTO "data" VALUES(0,X'0BEEC7B5EA3F0FDBC95D0DD47F3C5BC275DA8A33',0,X'666F6F');
CREATE TABLE obj (
partition INTEGER NOT NULL,
oid INTEGER NOT NULL,
tid INTEGER NOT NULL,
data_id INTEGER,
value_tid INTEGER,
PRIMARY KEY (partition, tid, oid));
INSERT INTO "obj" VALUES(0,0,231616946283203125,0,NULL);
CREATE TABLE pt (
rid INTEGER NOT NULL,
nid INTEGER NOT NULL,
state INTEGER NOT NULL,
PRIMARY KEY (rid, nid));
INSERT INTO "pt" VALUES(0,1,0);
INSERT INTO "pt" VALUES(1,1,0);
INSERT INTO "pt" VALUES(0,2,0);
INSERT INTO "pt" VALUES(2,2,0);
INSERT INTO "pt" VALUES(1,3,1);
INSERT INTO "pt" VALUES(2,3,1);
CREATE TABLE tobj (
partition INTEGER NOT NULL,
oid INTEGER NOT NULL,
tid INTEGER NOT NULL,
data_id INTEGER,
value_tid INTEGER,
PRIMARY KEY (tid, oid));
CREATE TABLE trans (
partition INTEGER NOT NULL,
tid INTEGER NOT NULL,
packed BOOLEAN NOT NULL,
oids BLOB NOT NULL,
user BLOB NOT NULL,
description BLOB NOT NULL,
ext BLOB NOT NULL,
ttid INTEGER NOT NULL,
PRIMARY KEY (partition, tid));
CREATE TABLE ttrans (
partition INTEGER NOT NULL,
tid INTEGER NOT NULL,
packed BOOLEAN NOT NULL,
oids BLOB NOT NULL,
user BLOB NOT NULL,
description BLOB NOT NULL,
ext BLOB NOT NULL,
ttid INTEGER NOT NULL);
CREATE INDEX _obj_i1 ON
obj(partition, oid, tid)
;
CREATE INDEX _obj_i2 ON
obj(data_id)
;
COMMIT;
BEGIN TRANSACTION;
CREATE TABLE config (
name TEXT NOT NULL PRIMARY KEY,
value TEXT);
INSERT INTO "config" VALUES('name','testStorageUpgrade');
INSERT INTO "config" VALUES('nid','3');
INSERT INTO "config" VALUES('partitions','3');
INSERT INTO "config" VALUES('replicas','1');
INSERT INTO "config" VALUES('ptid','8');
CREATE TABLE data (
id INTEGER PRIMARY KEY,
hash BLOB NOT NULL,
compression INTEGER NOT NULL,
value BLOB NOT NULL);
CREATE TABLE obj (
partition INTEGER NOT NULL,
oid INTEGER NOT NULL,
tid INTEGER NOT NULL,
data_id INTEGER,
value_tid INTEGER,
PRIMARY KEY (partition, tid, oid));
CREATE TABLE pt (
rid INTEGER NOT NULL,
nid INTEGER NOT NULL,
state INTEGER NOT NULL,
PRIMARY KEY (rid, nid));
INSERT INTO "pt" VALUES(0,1,0);
INSERT INTO "pt" VALUES(0,2,0);
INSERT INTO "pt" VALUES(1,1,0);
INSERT INTO "pt" VALUES(2,2,0);
INSERT INTO "pt" VALUES(1,3,0);
INSERT INTO "pt" VALUES(2,3,0);
CREATE TABLE tobj (
partition INTEGER NOT NULL,
oid INTEGER NOT NULL,
tid INTEGER NOT NULL,
data_id INTEGER,
value_tid INTEGER,
PRIMARY KEY (tid, oid));
CREATE TABLE trans (
partition INTEGER NOT NULL,
tid INTEGER NOT NULL,
packed BOOLEAN NOT NULL,
oids BLOB NOT NULL,
user BLOB NOT NULL,
description BLOB NOT NULL,
ext BLOB NOT NULL,
ttid INTEGER NOT NULL,
PRIMARY KEY (partition, tid));
CREATE TABLE ttrans (
partition INTEGER NOT NULL,
tid INTEGER NOT NULL,
packed BOOLEAN NOT NULL,
oids BLOB NOT NULL,
user BLOB NOT NULL,
description BLOB NOT NULL,
ext BLOB NOT NULL,
ttid INTEGER NOT NULL);
CREATE INDEX _obj_i1 ON
obj(partition, oid, tid)
;
CREATE INDEX _obj_i2 ON
obj(data_id)
;
COMMIT;
from __future__ import print_function
import os
import signal
import tempfile
import ZEO.runzeo
from ZEO.ClientStorage import ClientStorage as _ClientStorage
from . import buildUrlFromString, ADDRESS_TYPE, IP_VERSION_FORMAT_DICT
from .functional import AlreadyStopped, PortAllocator, Process
class ZEOProcess(Process):
def __init__(self, **kw):
super(ZEOProcess, self).__init__('runzeo', kw)
def run(self):
from ZEO.runzeo import ZEOServer
del ZEOServer.handle_sigusr2
getattr(ZEO, self.command).main()
class ClientStorage(_ClientStorage):
@property
def restore(self):
raise AttributeError('IStorageRestoreable disabled')
class ZEOCluster(object):
def start(self):
self.zodb_storage_list = []
local_ip = IP_VERSION_FORMAT_DICT[ADDRESS_TYPE]
port_allocator = PortAllocator()
port = port_allocator.allocate(ADDRESS_TYPE, local_ip)
self.address = buildUrlFromString(local_ip), port
temp_dir = tempfile.mkdtemp(prefix='neo_')
print('Using temp directory', temp_dir)
self.zeo = ZEOProcess(address='%s:%s' % self.address,
filename=os.path.join(temp_dir, 'Data.fs'))
port_allocator.release()
self.zeo.start()
def stop(self):
storage_list = self.zodb_storage_list
zeo = self.zeo
del self.zeo, self.zodb_storage_list
try:
for storage in storage_list:
storage.close()
zeo.kill(signal.SIGUSR2)
except AlreadyStopped:
pass
else:
zeo.child_coverage()
zeo.kill(signal.SIGKILL)
zeo.wait()
def getZODBStorage(self):
storage = ClientStorage(self.address)
self.zodb_storage_list.append(storage)
return storage
def setupDB(self):
pass
......@@ -38,7 +38,7 @@ extras_require = {
'master': [],
'storage-sqlite': [],
'storage-mysqldb': ['mysqlclient'],
'storage-importer': zodb_require,
'storage-importer': zodb_require + ['msgpack>=0.5.6', 'setproctitle'],
}
extras_require['tests'] = ['coverage', 'zope.testing', 'psutil>=2',
'neoppod[%s]' % ', '.join(extras_require)]
......@@ -60,7 +60,7 @@ else:
setup(
name = 'neoppod',
version = '1.9',
version = '1.10',
description = __doc__.strip(),
author = 'Nexedi SA',
author_email = 'neo-dev@erp5.org',
......
#!/usr/bin/env python
from __future__ import print_function
import sys
import os
import math
......@@ -17,6 +17,7 @@ class MatrixImportBenchmark(BenchmarkRunner):
def add_options(self, parser):
parser.add_option('-d', '--datafs')
parser.add_option('-z', '--zeo', action="store_true")
parser.add_option('', '--min-storages', type='int', default=1)
parser.add_option('', '--max-storages', type='int', default=2)
parser.add_option('', '--min-replicas', type='int', default=0)
......@@ -33,6 +34,7 @@ class MatrixImportBenchmark(BenchmarkRunner):
min_r = options.min_replicas,
max_r = options.max_replicas,
threaded = options.threaded,
zeo = options.zeo,
)
def start(self):
......@@ -47,30 +49,36 @@ class MatrixImportBenchmark(BenchmarkRunner):
if storages[-1] < max_s:
storages.append(max_s)
replicas = range(min_r, max_r + 1)
result_list = [self.runMatrix(storages, replicas)
for x in xrange(self._config.repeat)]
results = {}
for s in storages:
results[s] = z = {}
for r in replicas:
if r < s:
x = [x[s][r] for x in result_list if x[s][r] is not None]
if x:
z[r] = min(x)
def merge_min(a, b):
for k, vb in b.iteritems():
try:
va = a[k]
except KeyError:
pass
else:
z[r] = None
if type(va) is dict:
merge_min(va, vb)
continue
if vb is None or None is not va <= vb:
continue
a[k] = vb
for x in xrange(self._config.repeat):
merge_min(results, self.runMatrix(storages, replicas))
return self.buildReport(storages, replicas, results)
def runMatrix(self, storages, replicas):
stats = {}
if self._config.zeo:
stats['zeo'] = self.runImport()
for s in storages:
stats[s] = z = {}
for r in replicas:
if r < s:
z[r] = self.runImport(1, s, r, 100)
z[r] = self.runImport(1, s, r, 12*s//(1+r))
return stats
def runImport(self, masters, storages, replicas, partitions):
def runImport(self, *neo_args):
datafs = self._config.datafs
if datafs:
dfs_storage = FileStorage(file_name=self._config.datafs)
......@@ -79,28 +87,36 @@ class MatrixImportBenchmark(BenchmarkRunner):
import random, neo.tests.stat_zodb
dfs_storage = getattr(neo.tests.stat_zodb, datafs)(
random.Random(0)).as_storage(5000)
print "Import of %s with m=%s, s=%s, r=%s, p=%s" % (
datafs, masters, storages, replicas, partitions)
info = "Import of " + datafs
if neo_args:
masters, storages, replicas, partitions = neo_args
info += " with m=%s, s=%s, r=%s, p=%s" % (
masters, storages, replicas, partitions)
if self._config.threaded:
from neo.tests.threaded import NEOCluster
else:
from neo.tests.functional import NEOCluster
neo = NEOCluster(
zodb = NEOCluster(
db_list=['%s_matrix_%u' % (DB_PREFIX, i) for i in xrange(storages)],
clear_databases=True,
master_count=masters,
partitions=partitions,
replicas=replicas,
)
else:
from neo.tests.zeo_cluster import ZEOCluster
info += " with ZEO"
zodb = ZEOCluster()
print(info)
try:
neo.start()
zodb.start()
try:
neo_storage = neo.getZODBStorage()
if not self._config.threaded:
assert len(neo.getStorageList()) == storages
neo.expectOudatedCells(number=0)
storage = zodb.getZODBStorage()
if neo_args and not self._config.threaded:
assert len(zodb.getStorageList()) == storages
zodb.expectOudatedCells(number=0)
start = time()
neo_storage.copyTransactionsFrom(dfs_storage)
storage.copyTransactionsFrom(dfs_storage)
end = time()
size = dfs_storage.getSize()
if self._size is None:
......@@ -108,15 +124,14 @@ class MatrixImportBenchmark(BenchmarkRunner):
else:
assert self._size == size
finally:
neo.stop()
zodb.stop()
# Clear DB if no error happened.
neo.setupDB()
zodb.setupDB()
return end - start
except:
traceback.print_exc()
self.error_log += "Import with m=%s, s=%s, r=%s, p=%s:" % (
masters, storages, replicas, partitions)
self.error_log += "\n%s\n" % ''.join(traceback.format_exc())
self.error_log += "%s:\n%s\n" % (
info, ''.join(traceback.format_exc()))
def buildReport(self, storages, replicas, results):
# draw an array with results
......@@ -130,6 +145,14 @@ class MatrixImportBenchmark(BenchmarkRunner):
report += sep
failures = 0
speedlist = []
if self._config.zeo:
result = results['zeo']
if result is None:
result = 'FAIL'
failures += 1
else:
result = '%.1f kB/s' % (dfs_size / (result * 1e3))
self.add_status('ZEO', result)
for s in storages:
values = []
assert s in results
......@@ -151,7 +174,7 @@ class MatrixImportBenchmark(BenchmarkRunner):
if failures:
info = '%d failures' % (failures, )
else:
info = '%.1f KB/s' % (sum(speedlist) / len(speedlist))
info = '%.1f kB/s' % (sum(speedlist) / len(speedlist))
return info, report
def main(args=None):
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment