Commit 2cb29855 authored by Jim Fulton's avatar Jim Fulton

Started topic on writing persistent objects

Still more work to do.

Added machinery to make the documentation testable.
parent 2cf6eab4
[buildout] [buildout]
develop = develop = .
parts = parts =
stxpy stxpy
test
versions = versions [test]
unzip = true recipe = zc.recipe.testrunner
eggs = eggs = zodbdocumentationtests
[versions]
zc.buildout =
zc.recipe.egg =
[stxpy] [stxpy]
recipe = zc.recipe.egg recipe = zc.recipe.egg
......
==========================
Writing persistent objects
==========================
In the :ref:`Tutorial <tutorial-label>`, we discussed the basics of
implementing persisetnt objects by subclassing
``persistent.Persistent``. This is probably enough for 80% of
persistent-object classes you write, but there are some other aspects
of writing persistent classes you should be aware of.
Access and modification
=======================
Two of the main jobs of the ``Persistent`` base class is to detect
when an object has been accessed and when it has been modified. When
an object is accessed, it's state may need to be loaded from the
database. When an object us modified, the modification needs to be
saved if a transaction is committed.
``Persistent`` detects object accesses by hooking into object
attribute access and update. In the case of object update, there
maybe other ways of modifying state that we need to make provision for.
Rules of persistence
====================
When implementing persistet objects, be aware that an object's
attributes should be :
- immutable (such as strings or integers),
- persistent (subclass Persistent), or
- You need to take special precautions.
If you modify a non-persistent mutable value of a persistent-object
attribute, you need to mark the persistent object as changed yourself::
import persistent
class Book(persistent.Persistent):
def __init__(self, title):
self.title = title
self.authors = []
def add_author(self, author):
self.authors.append(author)
self._p_changed = True
.. -> src
>>> exec(src)
>>> db = ZODB.DB(None)
>>> with db.transaction() as conn:
... conn.root.book = Book("ZODB")
>>> conn = db.open()
>>> book = conn.root.book
>>> bool(book._p_changed)
False
>>> book.authors.append('Jim')
>>> bool(book._p_changed)
False
>>> book.add_author('Carlos')
>>> bool(book._p_changed)
True
>>> db.close()
In this example, ``Book`` objects have an ``authors`` object that's a
regular Python list, so it's mutable and non-persistent. When we add
an author, we append it to the ``authors`` attribute's value. Because
we didn't set an attribute on the book, it's not marked as changed, so
we set ``_p_changed`` ourselves.
Using standard Python lists, dicts, or sets is a common thing to do,
so this pattern of calling ``_p_changed`` is common.
Let's look at some alternatives.
Using tuples for small collections instead of lists
---------------------------------------------------
If objects contain collections that are small or that don't change
often, you can use tuples instead of lists::
import persistent
class Book(persistent.Persistent):
def __init__(self, title):
self.title = title
self.authors = ()
def add_author(self, author):
self.authors += (author, )
.. -> src
>>> exec(src)
>>> db = ZODB.DB(None)
>>> with db.transaction() as conn:
... conn.root.book = Book("ZODB")
>>> conn = db.open()
>>> book = conn.root.book
>>> bool(book._p_changed)
False
>>> book.add_author('Carlos')
>>> bool(book._p_changed)
True
>>> db.close()
Because tuples are immutable, they satisfy the rules of persistence
without any special handling.
Using persistent data structures
--------------------------------
The ``persistent`` package provides persistent versions of ``list``
and ``dict``, namely ``persistent.list.PersistentList`` and
``persistent.mapping.PersistentMapping``. We can update our example to
use ``PersistentList``::
import persistent
import persistent.list
class Book(persistent.Persistent):
def __init__(self, title):
self.title = title
self.authors = persistent.list.PersistentList()
def add_author(self, author):
self.authors.append(author)
.. -> src
>>> exec(src)
>>> db = ZODB.DB(None)
>>> with db.transaction() as conn:
... conn.root.book = Book("ZODB")
>>> conn = db.open()
>>> book = conn.root.book
>>> bool(book._p_changed)
False
>>> book.add_author('Carlos')
>>> bool(book._p_changed)
False
>>> bool(book.authors._p_changed)
True
>>> db.close()
Note that in this example, when we added an author, the book itself
didn't change, but the ``authors`` attribute value did. Because
``authors`` is a persistent object, it's stored in a separate database
record from the book record and is managed by ZODB independent of the
manageemnt of the book.
In addition to ``PersistentList`` and ``PersistentMapping``, general
persistent data structures are provided by the ``BTrees`` package,
most notably ``BTree`` and ``TreeSet`` objects. Unlike
``PersistentList`` and ``PersistentMapping``, ``BTree`` and
``TreeSet`` objects are scalable and can easily hold millions of
objects, because their data are spread over many subobjects.
It's generally better to use ``BTree`` objects than
``PersistentMapping`` objects, because they're scalable and because
the handle :ref:`conflicts <conflicts-label>` better. ``TreeSet``
objects are the only ZODB-provided persistent set implementation.
``BTree`` and ``TreeSets`` come in a number of families provided via
different modules and differ in their internal implementations:
=============== =============== ================
Module Key type Value Type
=============== =============== ================
BTrees.OOBTree object object
BTrees.IOBTree integer Object
BTrees.OIBTree object integer
BTrees.IIBTree integer integer
BTrees.IFBTree integer float
BTrees.LOBTree 64-bit integer Object
BTrees.OLBTree object 64-bit integer
BTrees.LLBTree 64-bit integer 64-bit integer
BTrees.LFBTree 64-bit integer float
=============== =============== ================
Here's a version of the example that uses a ``TreeSet``::
import persistent
from BTrees.OOBTree import TreeSet
class Book(persistent.Persistent):
def __init__(self, title):
self.title = title
self.authors = TreeSet()
def add_author(self, author):
self.authors.add(author)
.. -> src
>>> exec(src)
>>> db = ZODB.DB(None)
>>> with db.transaction() as conn:
... conn.root.book = Book("ZODB")
>>> conn = db.open()
>>> book = conn.root.book
>>> bool(book._p_changed)
False
>>> book.add_author('Carlos')
>>> bool(book._p_changed)
False
>>> bool(book.authors._p_changed)
True
>>> db.close()
Properties
==========
If you implement some attributes using Python properties (or other
types of descriptors), they are treated just like any other attributes
by the persistence machinery. When you set an attribute through a
property, the object is considered changed, even if the property
didn't actually modify the object state.
Special attributes
==================
There are some attributes that are treated specially.
Attributes with names starting with ``_p_`` are reserved for use by
the persistence machiner and by ZODB. These include:
_p_changed The ``_p_changed`` attribute has the value ``None`` if the
object is a :ref:`ghost <ghost-label>`, True if it's changed, an
False if it's not a ghost and not changed.
_p_oid
The object's unique id in the database.
_p_serial
The object's revision identifier also know as the object serial
number, also known as the object transaction id. It's a timestamp
and if not set as the value 0 encoded as string of 8 zero bytes.
_p_jar
The database connection the object was accessed through. This is
commonly used by database-aware application code to get hold of an
object's database connection.
Attributes with names starting with ``_v_`` are treated as volatile.
They aren't saved to the database. They are useful for caching data
that can be computed from saved data and shouldn't be saved. They
should be treated as though they can disappear between transactions.
Setting a volatile attribute doesn't cause an object to be considered
to be modified.
An object's ``__dict__`` attribute is treated specially in that
getting it doesn't cause an object's state to be loaded. It may have
the value ``None`` rather than a disctionary for :ref:`ghosts
<ghost-label>`.
Object storage and management
=============================
Every persistent object is stored in its own database record. Some
storages maintain multile object revisions, in which case each
persistent object is stored in its own set of records. Data for
different persistent objects are stored separately.
The database manages each object separately, according to a lifecycle
described in the next section.
This is important when considering how to distribute data accross your
objects. If you use lots of little persistent objects, then more
objects may need to be loaded or saved and you may incur more memory
overhead. OTOH, of objects are too big, you may load or save more data
than would otherwise be needed.
.. _schema-migration-label
Schema migration
================
Object requirements and implementations tend to evolve over time.
This isn't a problem for objects that are short lived, but persistent
objects may have lifetimes that extend for years. There needs to be
some way of making sure that state for an older object schema can
still be loaded into an object with the new schema.
Adding attributes
-----------------
Perhaps the commonest schema change is to add information. This can
often be accomplished by adding a default value in a class
definition::
class Book(persistent.Persistent):
publisher = 'UNKNOWN'
def __init__(self, title, publisher):
self.title = title
self.publisher = publisher
self.authors = TreeSet()
def add_author(self, author):
self.authors.add(author)
Removing attributes
-------------------
Removing attributes generally doesn't require any action, assuming
that their presence in older objects doesn't do any harm.
Renaming/moving classes
-----------------------
The easiest way to handle renaming or moving classes is to leave
aliases for the old name. For example, if we have a class,
``library.Book``, and want to move it to ``catalog.Publication``, we
can keep a ``library`` module that contains::
from catalog import Publication as Book # XXX deprecated name
A downside of this approach is that it clutters code and may even
cause is to keep modules solely to hold aliases. (`zope.deferredimport
<http://zopedeferredimport.readthedocs.io/en/latest/narrative.html>`_
can help with this by making these aliases a little more effecient and
by generating deprecation warnings.)
Object lifecycle states and special attributes (advanced)
=========================================================
Persistent objects typically transition through a collection of
states. Most of the time, you don't need to think too much about this.
Unsaved
When an object is created, it's said to be in an *unsaved* state
until it's associated with a database.
Added
When an unsaved object is added to a database, but hasn't been
saved by committing a transaction, it's in the *added* state.
Note that most objects are added implicitly by being set as
subobjects (attribute values or items) of objects already in the
database) of objects already in the database.
Saved
When an object is added and saved through a transaction commit, the
object is in the *saved* state.
Changed
When a saved object is updated, it enters the *changed* state to
indicate that there are changes that need to be committed.
.. _ghost-label:
Ghost
An object in the *ghost* state is an empty shell. It has no
state. When it's accessed, its state will be loaded automatically,
and it will enter the saved state. A saved object can become a
ghost if it hasn't been accessed in a while and the database
releases its state to make room for other objects. A changed
object can become a ghost if the transaction it's modified in is
aborted.
An object that's loaded from the database is loaded as a
ghost. This typically happens when the object is a subobjet of
another object whos state is loaded.
We can interrogate and control an object's state, although somewhat
indirectly. To do this, we'll look at some special persistent-object
attributes, described in `Special attributes`_, above.
Let's look at some state transitions with an example. First, we create
an unsaved book::
>>> book = Book("ZODB")
>>> from ZODB.utils import z64
>>> book._p_changed, bool(book._p_oid)
(False, False)
We can tell that it's unsaved because it doesn't have an object id, ``_p_oid``.
If we add it to a database::
>>> import ZODB
>>> connection = ZODB.connection(None)
>>> connection.add(book)
>>> book._p_changed, bool(book._p_oid), book._p_serial == z64
(False, True, True)
We know it's added because it has an oid, but its serial (object
revision timestamp), ``_p_serial``, is the special zero value it's
value for ``_p_changed`` is False.
If we commit the transaction that added it::
>>> import transaction
>>> transaction.commit()
>>> from ZODB.utils import z64
>>> book._p_changed, bool(book._p_oid), book._p_serial == z64
(False, True, False)
We see that the object is in the saved state because it has an object
id and serial, and is unchanged.
Now if we modify the object, it enters the changed state:
>>> book.title = "ZODB Explained"
>>> book._p_changed, bool(book._p_oid), book._p_serial == z64
(True, True, False)
If we abort the transaction, the object becomes a ghost:
>>> transaction.abort()
>>> book._p_changed, bool(book._p_oid)
(None, True)
We can see it's a ghost because ``_p_changed`` is None.
(``_p_serial`` isn't meaningful for ghosts.)
If we access the object, it will be loaded into the saved state, which
is indicated by a false ``_p_changed`` and an object id and non-zero serial.
>>> book.title
'ZODB'
>>> book._p_changed, bool(book._p_oid), book._p_serial == z64
(False, True, False)
Note that accessing ``_p_`` attributes didn't cause the object's state
to be loaded.
We've already seen how modifying ``_p_changed`` can cause an object to
be maked as modified. We can also use it to make an object into a
ghost:
>>> book._p_changed = None
>>> book._p_changed, bool(book._p_oid)
(None, True)
Other things you can do, but shouldn't
======================================
The first rule here is don't be clever!!! It's soooo tempting to be
clever, but it's almost never worth it.
Overriding ``__getstate__`` and ``__setstate__``
------------------------------------------------
When an object is saved in a database, it's ``__getstate__`` method is
called without arguments. The default implementation simply returns a
copy of an object's instance dictionary. (It's a little more
complicated for objects with slots.)
An object's state is loaded by loading the state from the database and
passing it to the object's ``__setstate__`` method. The default
implementation expects a discionary, which it used to populate the
object's instance dictionary.
Early on, we thought that overriding these methods would be useful for
tasks like providing more efficient state representations or for
:ref:`schema migration <schema-migration-label>`, but we found that
the result was to make object implementations brittle and/or complex
and the benefit usually wasn't worth it.
Overriding ``__getattr__``, ``__getattribute__``, or ``__setattribute__``
=========================================================================
This is something extremely clever people might attempt, but it's
probably never worth the bother. It's possible, but it requires such
deep understanding of persistence and internals that we're not even
going to document it. :)
##############################################################################
#
# Copyright (c) Zope Foundation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.0 (ZPL). A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE.
#
##############################################################################
import os
import doctest
import unittest
import manuel.capture
import manuel.doctest
import manuel.testing
import zope.testing.module
from os.path import join
def setUp(test):
import ZODB
test.globs.update(
ZODB=ZODB,
)
zope.testing.module.setUp(test)
def tearDown(test):
zope.testing.module.tearDown(test)
def test_suite():
here = os.path.dirname(__file__)
guide = join(here, '..', 'documentation', 'guide')
return unittest.TestSuite((
manuel.testing.TestSuite(
manuel.doctest.Manuel() + manuel.capture.Manuel(),
join(guide, 'writing-persistent-objects.rst'),
setUp=setUp, tearDown=tearDown,
),
))
if __name__ == '__main__':
unittest.main(defaultTest='test_suite')
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment