Commit ed365135 authored by Jim Fulton's avatar Jim Fulton

Documentation on transactions and threading.

This adds documentatuon on transactions and threading concerns.

There are some topics, like application design and conflict resolution
that might want a deeper treatment, but would probably be better
handled through articles or dedicated topics. (I want to avoid
individual topics being too long or weedy to read, where practical.)

Writing this, I stumbled a bit over thread-local transaction managers.
For most applications, they don't add anything over accessing
transaction managers on connections and actually provide an
opportunity to fail. I'm convinced that it should be possible to do
most transaction management through connections and that the API
provided by transaction managers and the transaction package should be
reserved for distributed transactions.

I didn't mention gevent. I thik there should be a section on gevent,
but I think it should be written by someone who's used gevent with
ZODB. :)

Maybe there should also be a section or mention of using asyncio with
ZODB, pr maybe later.

Closes zopefoundation/zodbdocs#13
Closes zopefoundation/zodbdocs#16
parent d6b60a5b
......@@ -2,7 +2,7 @@
Change History
================
5.0.1 (unreleased)
5.0.1 (2016-09-09)
==================
- Fix an AttributeError that DemoStorage could raise if it was asked
......@@ -12,6 +12,8 @@
- Call _p_resolveConflict() even if a conflicting change doesn't change the
state. This reverts to the behaviour of 3.10.3 and older.
- Many docstrings have been improved.
5.0.0 (2016-09-06)
==================
......
......@@ -435,7 +435,7 @@ new schema. This can be easy if your network of object references is quite
structured, making it easy to find all the instances of the class being
modified. For example, if all :class:`User` objects can be found inside a
single dictionary or BTree, then it would be a simple matter to loop over every
:class:`User` instance with a :keyword:`for` statement. This is more difficult
:class:`User` instance with a ``for`` statement. This is more difficult
if your object graph is less structured; if :class:`User` objects can be found
as attributes of any number of different class instances, then there's no longer
any easy way to find them all, short of writing a generalized object traversal
......
......@@ -14,13 +14,11 @@ If you haven't yet, you should read the :ref:`Tutorial <tutorial-label>`.
install-and-run
writing-persistent-objects.rst
transactions-and-threading
.. todo:
transaction.rst
storages.rst
configuration.rst
threading.rst
packing-and-garbage-collection.rst
blobs.rst
multi-databases.rst
blobs
packing-and-garbage-collection
multi-databases
blobs
......@@ -128,6 +128,8 @@ much of anything. Connections take care of loading and saving objects
and manage object caches. Each connection has it's own cache
[#caches-are-expensive]_.
.. _getting-connections:
Getting connections
-------------------
......@@ -144,7 +146,7 @@ db.open()
done using the connection.
If changes are made, the application :ref:`commits transactions
<commit-transactions>` to make them permanent.
<using-transactions-label>` to make them permanent.
db.transaction()
The database :meth:`~ZODB.DB.transaction` method
......
============================
Transactions and concurrency
============================
.. contents::
`Transactions <https://en.wikipedia.org/wiki/Database_transaction>`_
are a core feature of ZODB. Much has been written about transactions,
and we won't go into much detail here. Transactions provide 2 core
benefits:
Atomicity
When a transaction executes, it succeeds or fails completely. If
some data are updated and then an error occurs, causing the
transaction to fail, the updates are rolled back automatically. The
application using the transactional system doesn't have to undo
partial changes. This takes a significant burden from developers
and increases the reliability of applications.
Concurrency
Transactions provide a way of managing concurrent updates to data.
Different programs operate on the data independently, without having
to use low-level techniques to moderate their access. Coordination
and synchronization happens via transactions.
.. _using-transactions-label:
Using transactions
==================
All activity in ZODB happens in the context of database connections
and transactions. Here's a simple example::
import ZODB, transaction
db = ZODB.DB(None) # Use a mapping storage
conn = db.open()
conn.root.x = 1
transaction.commit()
.. -> src
>>> exec(src)
In the example above, we used ``transaction.commit()`` to commit a
transaction, making the change to ``conn.root`` permanent. This is
the most common way to use ZODB, at least historically.
If we decide we don't want to commit a transaction, we can use
``abort``::
conn.root.x = 2
transaction.abort()
.. -> src
>>> exec(src)
>>> conn.root.x
1
>>> conn.close()
In this example, because we aborted the transaction, the value of
``conn.root.x`` was rolled back to 1.
There are a number of things going on here that deserve some
explanation. When using transactions, there are three kinds of
objects involved:
Transaction
Transactions represent units of work. They have beginnings and
ends. They provide the
:interface:`~transaction.interfaces.ITransaction` interface.
Transaction manager
Transaction managers create transactions and
provide APIs to start and end transactions. The transactions
managed are always sequential. There is always exactly one active
transaction associated with a transaction manager at any point in
time. Transaction managers provide the
:interface:`~transaction.interfaces.ITransactionManager` interface.
Data manager
Data managers manage data associated with transactions. ZODB
connections are data managers. The details of how they interact
with transactions isn't important here.
Explicit transaction managers
-----------------------------
ZODB connections have transaction managers associated with them when
they're opened. When we call the database :meth:`~ZODB.DB.open` method
without an argument, a thread-local transaction manager is used. Each
thread has it's own transaction manager. When we called
``transaction.commit()`` above we were calling commit on the
thread-local transaction manager.
Because we used a thread-local transaction manager, all of the work in
the transaction needs to happen in the same thread. Similarly, only
one transaction can be active in a thread.
If we want to have transactions who's work is spread over multiple
threads, or if we wanted to run multiple simultaneous transactions in
a single thread, then we can create transaction managers ourselves and
pass them to :meth:`~ZODB.DB.open`::
tm = transaction.TransactionManager()
conn = db.open(tm)
conn.root.x = 2
tm.commit()
.. -> src
>>> exec(src)
In this example, to commit our work, we called ``commit()`` on the
transaction manager we created and passed to :meth:`~ZODB.DB.open`.
context managers
----------------
In the examples above, the transaction beginnings were
implicit. Transaction's were effectively
[#implicit-transaction-creation]_ created when the transaction
managers were created and when previous transactions were committed.
We can create transactions explicitly using
:meth:`~transaction.interfaces.ITransactionManager.begin`::
tm.begin()
.. -> src
>>> exec(src)
A more modern [#context-managers-are-new]_ way to manage transaction
boundaries is to use context managers and the Python ``with``
statement. Transaction managers are context managers, so we can use
them with the ``with`` statement directly::
with tm as trans:
trans.note("incrementing x")
conn.root.x += 1
.. -> src
>>> exec(src)
>>> conn.root.x
3
When used as a context manager, a transaction manager explicitly
begins a new transaction, executes the code block and commits the
transaction if there isn't an error and aborts it of there is an
error.
We used ``as trans`` above to get the transaction.
Databases provide the :meth:`~ZODB.DB.transaction` to execute a code
block in a transaction::
with db.transaction() as conn2:
conn2.root.x += 1
.. -> src
>>> exec(src)
Here, when we used ``as``, we got a connection, not a transaction.
This is because a new connection is opened by the
:meth:`~ZODB.DB.transaction`` method. A new transaction manager was
used as well.
Getting a connection's transaction manager
------------------------------------------
In the previous example, you may have wondered how one might get the
current transaction. Every connection has an associated transaction
manager, which is available as the ``transaction_manager`` attribute.
So, for example, if we wanted to set a transaction note::
with db.transaction() as conn2:
conn2.transaction_manager.get().note("incrementing x again")
conn2.root.x += 1
.. -> src
>>> exec(src)
>>> db.history(conn.root()._p_oid)[0]['description']
'incrementing x again'
Here, we used the
:meth:`~transaction.interfaces.ITransactionManager.get` method to get
the current transaction.
Connection isolation
--------------------
In the last few examples, we used a connection opened using
:meth:`~ZODB.DB.transaction`. This was distinct from and used a
different transaction manager than the original connection. If we
looked at the original connection, ``conn``, we'd see that it has the
same value for ``x`` that we set earlier:
>>> conn.root.x
3
This is because it's still in the same transaction that was implicitly
begun when a change was last committed against it. If we want to see
changes, we have to begin a new transaction:
>>> trans = tm.begin()
>>> conn.root.x
5
ZODB uses a timestamp-based commit protocol that provides `snapshot
isolation <https://en.wikipedia.org/wiki/Snapshot_isolation>`_.
Whenever we look at ZODB data, we see its state as of the time the
transaction began.
.. _conflicts-label:
Conflict errors
---------------
As mentioned in the previous section, each connection sees and
operates on a view of the database as of the transaction start time.
If two connections modify the same object at the same time, one of the
connections will get a conflict error when it tries to commit::
with db.transaction() as conn2:
conn2.root.x += 1
conn.root.x = 9
tm.commit() # will raise a conflict error
.. -> src
>>> exec(src) # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ZODB.POSException.ConflictError: ...
If we executed this code, we'd get ``ConflictError`` exception on the
last line. After a conflict error is raised, we'd need to abort the
transaction, or begin a new one, at which point we'd see the data as
written by the other connection:
>>> tm.abort()
>>> conn.root.x
6
The timestamp-based approach used by ZODB is referred to as an
*optimistic* approach, because it works best if there are no
conflicts.
The best way to avoid conflicts is to design your application so that
multiple connections don't update the same object at the same time.
This isn't always easy.
Sometimes you may need to queue some operations that update shared
data structures, like indexes, so the updates can be made by a
dedicated thread or process.
Conflict resolution
~~~~~~~~~~~~~~~~~~~
ZODB provides a conflict-resolution framework for merging conflicting
changes. This is implemented by `BTree
<https://pythonhosted.org/BTrees/>`_ buckets and ``Length`` objects.
The main data structures provided by BTrees: BTrees and TreeSets,
spread their data over multiple objects. The leaf-level objects,
called *buckets* allow distinct keys to be updated without causing
conflicts [#usually-avoids-conflicts]_.
``Length`` objects are conflict key counters, that merge changes by
simply accumulating changes.
The use of BTree buckets, and to a lesser degree ``Length`` objects
is a very common technique.
.. caution::
Conflict resolution weakens consistency. Resist the temptation to
try to implement conflict resolution yourself. In the future, ZODB
will provide greater control over conflict resolution, including
the option of disabling it.
It's generally best to avoid conflicts in the first place, if possible.
ZODB and atomicity
==================
ZODB provides atomic transactions. When using ZODB, it's important to
align work with transactions. Once a transaction is committed, it
can't be rolled back [#undo]_ automatically. For applications, this
implies that work that should be atomic shouldn't be split over
multiple transactions. This may seem somewhat obvious, but the rule
can be broken in non-obvious ways. For example a Web API that splits
logical operations over multiple web requests, as is often done in
`REST
<https://en.wikipedia.org/wiki/Representational_state_transfer>`_
APIs, violate this rule.
Partial transaction error recovery using savepoints
---------------------------------------------------
A transaction can be split into multiple steps that can be rolled back
individually. This is done by creating savepoints. Changes in a
savepoint can be rolled back without rolling back an entire
transaction::
import ZODB
db = ZODB.DB(None) # using a mapping storage
with db.transaction() as conn:
conn.root.x = 1
conn.root.y = 0
savepoint = conn.transaction_manager.savepoint()
conn.root.y = 2
savepoint.rollback()
with db.transaction() as conn:
print(conn.root.x, conn.root.y) # prints 1 0
.. -> src
>>> exec(src)
1 0
If we executed this code, it would print 1 and 0, because while the
initial changes were committed, the changes in the savepoint were
rolled back.
A secondary benefit of savepoints is that they save any changes made
before the savepoint to a file, so that memory of changed objects can
be freed if they aren't used later in the transaction.
Concurrency, threads and processes
==================================
ZODB supports concurrency through transactions. Multiple programs can
operate independently in separate transactions. They synchronize at
transaction boundaries.
The most common way to run ZODB is with each program running in it's
own thread. Usually using the thread-local transaction manager is used.
You can use multiple threads per transaction and you can run multiple
transactions in a single thread. To do this, you need to instantiate
and use your own transaction manager, as described in `Explicit
transaction managers`_. To run multiple transaction managers
simultaneously in a thread, you need to use a separate transaction
manager for each transaction.
To spread a transaction over multiple threads, you need to keep in
mind that database connections, transaction managers and transactions
are **not thread-safe**. You have to prevent simultaneous access from
multiple threads. For this reason, **using multiple threads with a
single connection is not recommended**, but it is possible with care.
Using multiple processes
------------------------
Using multiple Python processes is a good way to scale an application
horizontally, especially given Python's `global interpreter lock
<https://wiki.python.org/moin/GlobalInterpreterLock>`_.
Some things to keep in mind when utilizing multiple processes:
- If using the :mod:`multiprocessing` module, you can't
[#cant-share-now]_ share databases or connections between
processes. When you launch a subprocess, you'll need to
re-instantiate your storage and database.
- You'll need to use a storage such as `ZEO
<https://github.com/zopefoundation/ZEO>`_, `RelStorage
<http://relstorage.readthedocs.io/en/latest/>`_, or `NEO
<http://www.neoppod.org/>`_, that supports multiple processes. None
of the included storages do.
.. [#usually-avoids-conflicts] Conflicts can still occur when buckets
split due to added objects causing them to exceed their maximum size.
.. [#undo] Transactions can't be rolled back, but they may be undone
in some cases, especially of subsequent transactions
haven't modified the same objects.
.. [#implicit-transaction-creation] Transactions are implicitly
created when needed, such as when data are first modified.
.. [#context-managers-are-new] ZODB and the transaction package
predate context managers and the Python ``with`` statement.
.. [#cant-share-now] at least not now.
......@@ -6,6 +6,6 @@ Reference Documentation
.. toctree::
:maxdepth: 2
zodb.rst
storages.rst
zodb
storages
transaction
============
Transactions
============
Transaction support is provided by the `transaction
<http://transaction.readthedocs.io/en/latest/>`_ package, which is
installed automatically when you install ZODB. There are 2 important
APIs provided by the transaction package, ``ITransactionManager`` and
``ITransaction``, described below.
ITransactionManager
===================
.. autointerface:: transaction.interfaces.ITransactionManager
:members: begin, get, commit, abort, doom, isDoomed, savepoint
ITransaction
============
.. autointerface:: transaction.interfaces.ITransaction
:members: user, description, commit, abort, doom, savepoint, note,
setUser, setExtendedInfo,
addBeforeCommitHook, getBeforeCommitHooks,
addAfterCommitHook, getAfterCommitHooks
......@@ -84,7 +84,8 @@ Connections
.. autoclass:: ZODB.Connection.Connection
:members: add, cacheGC, cacheMinimize, close, db, get,
getDebugInfo, get_connection, isReadOnly, oldstate,
onCloseCallback, root, setDebugInfo, sync
onCloseCallback, root, setDebugInfo, sync,
transaction_manager
TimeStamp (transaction ids)
===========================
......
......@@ -50,6 +50,8 @@ import six
from .mvccadapter import HistoricalStorageAdapter
from . import valuedoc
global_reset_counter = 0
noop = lambda : None
......@@ -88,6 +90,9 @@ class Connection(ExportImport, object):
_code_timestamp = 0
#: Transaction manager associated with the connection when it was opened.
transaction_manager = valuedoc.ValueDoc('current transaction manager')
##########################################################################
# Connection methods, ZODB.IConnection
......
......@@ -43,6 +43,7 @@ def test_suite():
manuel.doctest.Manuel() + manuel.capture.Manuel(),
join(guide, 'writing-persistent-objects.rst'),
join(guide, 'install-and-run.rst'),
join(guide, 'transactions-and-threading.rst'),
join(reference, 'zodb.rst'),
join(reference, 'storages.rst'),
setUp=setUp, tearDown=tearDown,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment