Commit e029a0cd authored by Jim Fulton's avatar Jim Fulton

This is superceded by the storage interfaces in ZODB.interfaces.

parent 7bedfa43
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Copyright (c) 2001, 2002, 2003 Zope Corporation and Contributors.
% All Rights Reserved.
%
% This software is subject to the provisions of the Zope Public License,
% Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution.
% THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
% WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
% WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
% FOR A PARTICULAR PURPOSE.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\documentclass{howto}
\title{ZODB Storage API}
\release{1.00}
\author{Zope Corporation}
\authoraddress{
Lafayette Technology Center\\
513 Prince Edward Street\\
Fredericksburg, VA 22401\\
\url{http://www.zope.com/}
}
\begin{document}
\maketitle
\begin{abstract}
\noindent
A ZODB storage provides the low-level storage for ZODB transactions.
Examples include FileStorage, OracleStorage, and bsddb3Storage. The
storage API handles storing and retrieving individual objects in a
transaction-specifc way. It also handles operations like pack and
undo. This document describes the interface implemented by storages.
\end{abstract}
\tableofcontents
\section{Concepts}
\subsection{Versions}
Versions provide support for long-running transactions. They extend
transaction semantics, such as atomicity and serializability, to
computation that involves many basic transactions, spread over long
periods of time, which may be minutes, or years.
Versions were motivated by a common problem in website management,
but may be useful in other domains as well. Often, a website must be
changed in such a way that changes, which may require many operations
over a period of time, must not be visible until completed and
approved. Typically this problem is solved through the use of
\dfn{staging servers}. Essentially, two copies of a website are
maintained. Work is performed on a staging server. When work is
completed, the entire site is copied from the staging server to the
production server. This process is too resource intensive and too
monolithic. It is not uncommon for separate unrelated changes to be
made to a website and these changes will need to be copied to the
production server independently. This requires an unreasonable amount
of coordination, or multiple staging servers.
ZODB addresses this problem through long-running transactions, called
\dfn{versions}. Changes made to a website can be made to a version
(of the website). The author sees the version of the site that
reflects the changes, but people working outside of the version cannot
see the changes. When the changes are completed and approved, they
can be saved, making them visible to others, almost instantaneously.
Versions require support from storage managers. Version support is an
optional feature of storage managers and support in a particular
database will depend on support in the underlying storage manager.
\section{Storage Interface}
General issues:
The objects are stored as Python pickles. The pickle format is
important, because various parts of ZODB depend on it, e.g. pack.
Conflict resolution
Various versions of the interface.
Concurrency and transactions.
The various exceptions that can be raised.
An object that implements the \class{Storage} interface must support
the following methods:
\begin{methoddesc}{tpc_begin}{transaction\optional{, tid\optional{,
status}}}
Begin the two-phase commit for \var{transaction}.
This method blocks until the storage is in the not committing state,
and then places the storage in the committing state. If the storage
is in the committing state and the given transaction is the
transaction that is already being committed, then the call does not
block and returns immediately without any effect.
The optional \var{tid} argument specifies the timestamp to be used
for the transaction ID and the new object serial numbers. If it is
not specified, the implementation chooses the timestamp.
The optional \var{status} argument, which has a default value of
\code{'~'}, has something to do with copying transactions.
\end{methoddesc}
\begin{methoddesc}{store}{oid, serial, data, version, transaction}
Store \var{data}, a Python pickle, for the object ID identified by
\var{oid}. A Storage need not and often will not write data
immediately. If data are written, then the storage should be
prepared to undo the write if a transaction is aborted.
The value of \var{serial} is opaque; it should be the value returned
by the \method{load()} call that read the object. \var{version} is
a string that identifies the version or the empty string.
\var{transaction}, an instance of
\class{ZODB.Transaction.Transaction}, is the current transaction.
The current transaction is the transaction passed to the most recent
\method{tpc_begin()} call.
There are several possible return values, depending in part on
whether the storage writes the data immediately. The return value
will be one of:
\begin{itemize}
\item \code{None}, indicating the data has not been stored yet
\item a string, containing the new serial number for the
object
\item a sequence of object ID, serial number pairs, containing the
new serial numbers for objects updated by earlier
\method{store()} calls that are part of this transaction.
If the serial number is not a string, it is an exception
object that should be raised by the caller.
\note{This explanation is confusing; how to tell the
sequence of pairs from the exception? Barry, Jeremy, please
clarify here.}
\end{itemize}
Several different exceptions can be raised when an error occurs.
\begin{itemize}
\item \exception{ConflictError} is raised when \var{serial}
does not match the most recent serial number for object
\var{oid}.
\item \exception{VersionLockError} is raised when object
\var{oid} is locked in a version and the \var{version}
argument contains a different version name or is empty.
\item \exception{StorageTransactionError} is raised when
\var{transaction} does not match the current transaction.
\item \exception{StorageError} or, more often, a subclass of
it, is raised when an internal error occurs while the
storage is handling the \method{store()} call.
\end{itemize}
\end{methoddesc}
\begin{methoddesc}{restore}{oid, serial, data, version, transaction}
A lot like \method{store()} but without all the consistency checks.
This should only be used when we \emph{know} the data is good, hence
the method name. While the signature looks like \method{store()},
there are some differences:
\begin{itemize}
\item \var{serial} is the serial number of this revision, not
of the previous revision. It is used instead of
\code{self._serial}, which is ignored.
\item Nothing is returned.
\item \var{data} can be \code{None}, which indicates a George
Bailey object (one who's creation has been transactionally
undone).
\end{itemize}
\end{methoddesc}
\begin{methoddesc}{new_oid}{}
XXX
\end{methoddesc}
\begin{methoddesc}{tpc_vote}{transaction}
XXX
\end{methoddesc}
\begin{methoddesc}{tpc_finish}{transaction, func}
Finish the transaction, making any transaction changes
permanent. Changes must be made permanent at this point.
If \var{transaction} is not the current transaction, nothing
happens.
\var{func} is called with no arguments while the storage lock is
held, but possibly before the updated date is made durable. This
argument exists to support the \class{Connection} object's
invalidation protocol.
\end{methoddesc}
\begin{methoddesc}{abortVersion}{version, transaction}
Clear any changes made by the given version. \var{version} is the
version to be aborted; it may not be the empty string.
\var{transaction} is the current transaction.
This method is state dependent. It is an error to call this method
if the storage is not committing, or if the given transaction is not
the transaction given in the most recent \method{tpc_begin()}.
If undo is not supported, then version data may be simply
discarded. If undo is supported, however, then the
\method{abortVersion()} operation must be undoable, which implies
that version data must be retained. Use the \method{supportsUndo()}
method to determine if the storage supports the undo operation.
\end{methoddesc}
\begin{methoddesc}{commitVersion}{source, destination, transaction}
Store changes made in the \var{source} version into the
\var{destination} version. A \exception{VersionCommitError} is
raised if the \var{source} and \var{destination} are equal or if
\var{source} is an empty string. The \var{destination} may be an
empty string, in which case the data are saved to non-version
storage.
This method is state dependent. It is an error to call this method
if the storage is not committing, or if the given transaction is not
the transaction given in the most recent \method{tpc_begin()}.
If the storage doesn't support undo, then the old version data may
be discarded. If undo is supported, then this operation must be
undoable and old transaction data may not be discarded. Use the
\method{supportsUndo()} method to determine if the storage supports
the undo operation.
\end{methoddesc}
\begin{methoddesc}{close}{}
Finalize the storage, releasing any external resources. The storage
should not be used after this method is called.
\end{methoddesc}
\begin{methoddesc}{lastSerial}{oid}
Returns the serial number for the last committed transaction for the
object identified by \var{oid}. If there is no serial number for
\var{oid} --- which can only occur if it represents a new object ---
returns \code{None}.
\note{This is not defined for \class{ZODB.BaseStorage}.}
\end{methoddesc}
\begin{methoddesc}{lastTransaction}{}
Return transaction ID for last committed transaction.
\note{This is not defined for \class{ZODB.BaseStorage}.}
\end{methoddesc}
\begin{methoddesc}{getName}{}
Returns the name of the store. The format and interpretation of
this name is storage dependent. It could be a file name, a database
name, etc.
\end{methoddesc}
\begin{methoddesc}{getSize}{}
An approximate size of the database, in bytes.
\end{methoddesc}
\begin{methoddesc}{getSerial}{oid}
Return the serial number of the most recent version of the object
identified by \var{oid}.
\end{methoddesc}
\begin{methoddesc}{load}{oid, version}
Returns the pickle data and serial number for the object identified
by \var{oid} in the version \var{version}.
\end{methoddesc}
\begin{methoddesc}{loadSerial}{oid, serial}
Load a historical version of the object identified by \var{oid}
having serial number \var{serial}.
\end{methoddesc}
\begin{methoddesc}{modifiedInVersion}{oid}
Returns the version that the object with identifier \var{oid} was
modified in, or an empty string if the object was not modified in a
version.
\end{methoddesc}
\begin{methoddesc}{isReadOnly}{}
Returns true if the storage is read-only, otherwise returns false.
\end{methoddesc}
\begin{methoddesc}{supportsTransactionalUndo}{}
Returns true if the storage implementation supports transactional
undo, or false if it does not.
\note{This is not defined for \class{ZODB.BaseStorage}.}
\end{methoddesc}
\begin{methoddesc}{supportsUndo}{}
Returns true if the storage implementation supports undo, or false
if it does not.
\end{methoddesc}
\begin{methoddesc}{supportsVersions}{}
Returns true if the storage implementation supports versions, or
false if it does not.
\end{methoddesc}
\begin{methoddesc}{transactionalUndo}{transaction_id, transaction}
Undo a transaction specified by \var{transaction_id}. This may need
to do conflict resolution.
\note{This is not defined for \class{ZODB.BaseStorage}.}
\end{methoddesc}
\begin{methoddesc}{undo}{transaction_id}
Undo the transaction corresponding to the transaction ID given by
\var{transaction_id}. If the transaction cannot be undone, then
\exception{UndoError} is raised. On success, returns a sequence of
object IDs that were affected.
\end{methoddesc}
\begin{methoddesc}{undoInfo}{XXX}
XXX
\end{methoddesc}
\begin{methoddesc}{undoLog}{\optional{first\optional{,
last\optional{, filter}}}}
Returns a sequence of \class{TransactionDescription} objects for
undoable transactions. \var{first} gives the index of the first
transaction to be retured, with \code{0} (the default) being the
most recent.
\note{\var{last} is confusing; can Barry or Jeremy try to explain
this?}
If \var{filter} is provided and not \code{None}, it must be a
function which accepts a \class{TransactionDescription} object as a
parameter and returns true if the entry should be reported. If
omitted or \code{None}, all entries are reported.
\end{methoddesc}
\begin{methoddesc}{versionEmpty}{version}
Return true if there are no transactions for the specified version.
\end{methoddesc}
\begin{methoddesc}{versions}{\optional{max}}
Return a sequence of the versions stored in the storage. If
\var{max} is given, the implementation may choose not to return more
than \var{max} version names.
\end{methoddesc}
\begin{methoddesc}{history}{oid\optional{, version\optional{,
size\optional{, filter}}}}
Return a sequence of \class{HistoryEntry} objects. The information
provides a log of the changes made to the object. Data are reported
in reverse chronological order. If \var{version} is given, history
information is given with respect to the specified version, or only
the non-versioned changes if the empty string is given. By default,
all changes are reported. The number of history entries reported is
constrained by \var{size}, which defaults to \code{1}. If
\var{filter} is provided and not \code{None}, it must be a function
which accepts a \class{HistoryEntry} object as a parameter and
returns true if the entry should be reported. If omitted or
\code{None}, all entries are reported.
\end{methoddesc}
\begin{methoddesc}{pack}{t, referencesf}
Remove transactions from the database that are no longer needed to
maintain the current state of the database contents. Undo will not
be restore objects to states from before the most recent call to
\method{pack()}.
\end{methoddesc}
\begin{methoddesc}{copyTransactionsFrom}{other\optional{, verbose}}
Copy transactions from another storage, given by \var{other}. This
is typically used when converting a database from one storage
implementation to another. This will use \method{restore()} if
available, but will use \method{store()} if \method{restore()} is
not available. When \method{store()} is needed, this may fail with
\exception{ConflictError} or \exception{VersionLockError}.
\end{methoddesc}
\begin{methoddesc}{iterator}{\optional{start\optional{, stop}}}
Return a iterable object which produces all the transactions from a
range. If \var{start} is given and not \code{None}, transactions
which occurred before the identified transaction are ignored. If
\var{stop} is given and not \code{None}, transactions which occurred
after the identified transaction are ignored; the specific
transaction identified by \var{stop} will be included in the series
of transactions produced by the iterator.
\note{This is not defined for \class{ZODB.BaseStorage}.}
\end{methoddesc}
\begin{methoddesc}{registerDB}{db, limit}
Register a database \var{db} for distributed storage invalidation
messages. The maximum number of objects to invalidate is given by
\var{limit}. If more objects need to be invalidated than this
limit, then all objects are invalidated. This argument may be
\code{None}, in which case no limit is set. Non-distributed
storages should treat this is a null operation. Storages should
work correctly even if this method is not called.
\end{methoddesc}
\section{ZODB.BaseStorage Implementation}
\section{Notes for Storage Implementors}
\section{Distributed Storage Interface}
Distributed storages support use with multiple application processes.
Distributed storages have a storage instance per application and some
sort of central storage server that manages data on behalf of the
individual storage instances.
When a process changes an object, the object must be invaidated in all
other processes using the storage. The central storage sends a
notification message to the other storage instances, which, in turn,
send invalidation messages to their respective databases.
\end{document}
...@@ -322,10 +322,14 @@ class IStorageDB(Interface): ...@@ -322,10 +322,14 @@ class IStorageDB(Interface):
there would be so many that it would be inefficient to do so. there would be so many that it would be inefficient to do so.
""" """
def invalidate(transaction_id, oids): def invalidate(transaction_id, oids, version=''):
"""Invalidate object ids committed by the given transaction """Invalidate object ids committed by the given transaction
The oids argument is an iterable of object identifiers. The oids argument is an iterable of object identifiers.
The version argument is provided for backward
compatibility. If passed, it must be an empty string.
""" """
def references(record, oids=None): def references(record, oids=None):
...@@ -339,10 +343,11 @@ class IStorageDB(Interface): ...@@ -339,10 +343,11 @@ class IStorageDB(Interface):
class IDatabase(IStorageDB): class IDatabase(IStorageDB):
"""ZODB DB. """ZODB DB.
TODO: This interface is incomplete.
""" """
# TODO: This interface is incomplete.
# XXX how is it incomplete?
databases = Attribute( databases = Attribute(
"""A mapping from database name to DB (database) object. """A mapping from database name to DB (database) object.
...@@ -423,13 +428,16 @@ class IStorage(Interface): ...@@ -423,13 +428,16 @@ class IStorage(Interface):
def close(): def close():
"""Close the storage. """Close the storage.
Finalize the storage, releasing any external resources. The
storage should not be used after this method is called.
""" """
def getName(): def getName():
"""The name of the storage """The name of the storage
The format and interpretation of this name is storage The format and interpretation of this name is storage
dependent. It could be a file name, a database name, etc. dependent. It could be a file name, a database name, etc..
This is used soley for informational purposes. This is used soley for informational purposes.
""" """
...@@ -440,7 +448,7 @@ class IStorage(Interface): ...@@ -440,7 +448,7 @@ class IStorage(Interface):
This is used soley for informational purposes. This is used soley for informational purposes.
""" """
def history(oid, size=1): def history(oid, version='', size=1):
"""Return a sequence of history information dictionaries. """Return a sequence of history information dictionaries.
Up to size objects (including no objects) may be returned. Up to size objects (including no objects) may be returned.
...@@ -453,20 +461,31 @@ class IStorage(Interface): ...@@ -453,20 +461,31 @@ class IStorage(Interface):
time time
UTC seconds since the epoch (as in time.time) that the UTC seconds since the epoch (as in time.time) that the
object revision was committed. object revision was committed.
tid tid
The transaction identifier of the transaction that The transaction identifier of the transaction that
committed the version. committed the version.
serial
An alias for tid, which expected by older clients.
user_name user_name
The user identifier, if any (or an empty string) of the The user identifier, if any (or an empty string) of the
user on whos behalf the revision was committed. user on whos behalf the revision was committed.
description description
The transaction description for the transaction that The transaction description for the transaction that
committed the revision. committed the revision.
size size
The size of the revision data record. The size of the revision data record.
If the transaction had extension items, then these items are If the transaction had extension items, then these items are
also included if they don't conflict with the keys above. also included if they don't conflict with the keys above.
The version argument is provided for backward
compatibility. It should always be an empty string.
""" """
def isReadOnly(): def isReadOnly():
...@@ -476,6 +495,11 @@ class IStorage(Interface): ...@@ -476,6 +495,11 @@ class IStorage(Interface):
same value. Read-only-ness is a static property of a storage. same value. Read-only-ness is a static property of a storage.
""" """
# XXX Note that this method doesn't really buy us much,
# especially since we have to account for the fact that a
# ostensibly non-read-only storage may be read-only
# transiently. It would be better to just have read-only errors.
def lastTransaction(): def lastTransaction():
"""Return the id of the last committed transaction """Return the id of the last committed transaction
""" """
...@@ -486,9 +510,13 @@ class IStorage(Interface): ...@@ -486,9 +510,13 @@ class IStorage(Interface):
This is used soley for informational purposes. This is used soley for informational purposes.
""" """
def load(oid): def load(oid, version):
"""Load data for an object id """Load data for an object id
The version argumement should always be an empty string. It
exists soley for backward compatibility with older storage
implementations.
A data record and serial are returned. The serial is a A data record and serial are returned. The serial is a
transaction identifier of the transaction that wrote the data transaction identifier of the transaction that wrote the data
record. record.
...@@ -566,7 +594,7 @@ class IStorage(Interface): ...@@ -566,7 +594,7 @@ class IStorage(Interface):
has a reasonable chance of being unique. has a reasonable chance of being unique.
""" """
def store(oid, serial, data, transaction): def store(oid, serial, data, version, transaction):
"""Store data for the object id, oid. """Store data for the object id, oid.
Arguments: Arguments:
...@@ -585,12 +613,15 @@ class IStorage(Interface): ...@@ -585,12 +613,15 @@ class IStorage(Interface):
data data
The data record. This is opaque to the storage. The data record. This is opaque to the storage.
version
This must be an empty string. It exists for backward compatibility.
transaction transaction
A transaction object. This should match the current A transaction object. This should match the current
transaction for the storage, set by tpc_begin. transaction for the storage, set by tpc_begin.
The new serial for the object is returned, but not necessarily The new serial for the object is returned, but not necessarily
immediately. It may be returned directly, or un a subsequent immediately. It may be returned directly, or on a subsequent
store or tpc_vote call. store or tpc_vote call.
The return value may be: The return value may be:
...@@ -607,6 +638,21 @@ class IStorage(Interface): ...@@ -607,6 +638,21 @@ class IStorage(Interface):
ZODB.ConflictResolution.ResolvedSerial to indicate that a ZODB.ConflictResolution.ResolvedSerial to indicate that a
conflict occured and that the object should be invalidated. conflict occured and that the object should be invalidated.
Several different exceptions may be raised when an error occurs.
ConflictError
is raised when serial does not match the most recent serial
number for object oid and the conflict was not resolved by
the storage.
StorageTransactionError
is raised when transaction does not match the current
transaction.
StorageError or, more often, a subclass of it
is raised when an internal error occurs while the storage is
handling the store() call.
""" """
def tpc_abort(transaction): def tpc_abort(transaction):
...@@ -667,6 +713,13 @@ class IStorage(Interface): ...@@ -667,6 +713,13 @@ class IStorage(Interface):
""" """
class IStorageRestoreable(IStorage): class IStorageRestoreable(IStorage):
"""Copying Transactions
The IStorageRestoreable interface supports copying
already-committed transactions from one storage to another. This
is typically done for replication or for moving data from one
storage implementation to another.
"""
def tpc_begin(transaction, tid=None): def tpc_begin(transaction, tid=None):
"""Begin the two-phase commit process. """Begin the two-phase commit process.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment