Major update, featuring SSL suppprt

83876966 · Jim Fulton · 01cda6b4 · 83876966
Commit 83876966 authored Jun 22, 2016 by Jim Fulton
Show whitespace changes
Inline Side-by-side

Showing with 658 additions and 0 deletions

doc/README.rst doc/README.rst +658 -0

No files found.
--- a/doc/README.rst
+++ b/doc/README.rst
+==========================
+ZEO HOWTO
+==========================
+
+.. contents;;
+
+Introduction
+============
+
+ZEO (Zope Enterprise Objects) is a client-server system for sharing a
+single storage among many clients. When you use ZEO, a lower-level
+storage, typically a file storage, is opened in the ZEO server
+process.  Client programs connect to this process using a ZEO
+ClientStorage.  ZEO provides a consistent view of the database to all
+clients.  The ZEO client and server communicate using a custom
+protocol layered on top of TCP.
+
+There are several features that affect the behavior of
+ZEO.  This section describes how a few of these features
+work.  Subsequent sections describe how to configure every option.
+
+Client cache
+------------
+
+Each ZEO client keeps an on-disk cache of recently used data records
+to avoid fetching those records from the server each time they are
+requested.  It is usually faster to read the objects from disk than it
+is to fetch them over the network.  The cache can also provide
+read-only copies of objects during server outages.
+
+The cache may be persistent or transient. If the cache is persistent,
+then the cache files are retained for use after process restarts. A
+non-persistent cache uses temporary files that are removed when the
+client storage is closed.
+
+The client cache size is configured when the ClientStorage is created.
+The default size is 20MB, but the right size depends entirely on the
+particular database.  Setting the cache size too small can hurt
+performance, but in most cases making it too big just wastes disk
+space.
+
+ZEO uses invalidations for cache consistency.  Every time an object is
+modified, the server sends a message to each client informing it of
+the change.  The client will discard the object from its cache when it
+receives an invalidation. (It's actually a little more complicated,
+but we won't get into that here.)
+
+Each time a client connects to a server, it must verify that its cache
+contents are still valid.  (It did not receive any invalidation
+messages while it was disconnected.)  This involves asking the server
+to replay invalidations it missed. If it's been disconnected too long,
+it discards its cache.
+
+
+Invalidation queue
+------------------
+
+The ZEO server keeps a queue of recent invalidation messages in
+memory.  When a client connects to the server, it sends the timestamp
+of the most recent invalidation message it has received.  If that
+message is still in the invalidation queue, then the server sends the
+client all the missing invalidations.
+
+The default size of the invalidation queue is 100.  If the
+invalidation queue is larger, it will be more likely that a client
+that reconnects will be able to verify its cache using the queue.  On
+the other hand, a large queue uses more memory on the server to store
+the message.  Invalidation messages tend to be small, perhaps a few
+hundred bytes each on average; it depends on the number of objects
+modified by a transaction.
+
+You can also provide an invalidation age when configuring the
+server. In this case, if the invalidation queue is too small, but a
+client has been disconnected for a time interval that is less than the
+invalidation age, then invalidations are replayed by iterating over
+the lower-level storage on the server.  If the age is too high, and
+clients are disconneced for a long time, then this can put a lot of
+load on the server.
+
+Transaction timeouts
+--------------------
+
+A ZEO server can be configured to timeout a transaction if it takes
+too long to complete.  Only a single transaction can commit at a time;
+so if one transaction takes too long, all other clients will be
+delayed waiting for it.  In the extreme, a client can hang during the
+commit process.  If the client hangs, the server will be unable to
+commit other transactions until it restarts.  A well-behaved client
+will not hang, but the server can be configured with a transaction
+timeout to guard against bugs that cause a client to hang.
+
+If any transaction exceeds the timeout threshold, the client's
+connection to the server will be closed and the transaction aborted.
+Once the transaction is aborted, the server can start processing other
+client's requests.  Most transactions should take very little time to
+commit.  The timer begins for a transaction after all the data has
+been sent to the server.  At this point, the cost of commit should be
+dominated by the cost of writing data to disk; it should be unusual
+for a commit to take longer than 1 second.  A transaction timeout of
+30 seconds should tolerate heavy load and slow communications between
+client and server, while guarding against hung servers.
+
+When a transaction times out, the client can be left in an awkward
+position.  If the timeout occurs during the second phase of the two
+phase commit, the client will log a panic message.  This should only
+cause problems if the client transaction involved multiple storages.
+If it did, it is possible that some storages committed the client
+changes and others did not.
+
+Connection management
+---------------------
+
+A ZEO client manages its connection to the ZEO server.  If it loses
+the connection, it attempts to reconnect.  While
+it is disconnected, it can satisfy some reads by using its cache.
+
+The client can be configured with multiple server addresses.  In this
+case, it assumes that each server has identical content and will use
+any server that is available.  It is possible to configure the client
+to accept a read-only connection to one of these servers if no
+read-write connection is available.  If it has a read-only connection,
+it will continue to poll for a read-write connection.
+
+If a single address resolves to multiple IPv4 or IPv6 addresses,
+the client will connect to an arbitrary of these addresses.
+
+Authentication
+--------------
+
+ZEO supports SSL certificate authentication.
+
+Installing software
+===================
+
+ZEO is installed like any other Python package using pip, buildout, or
+pther Python packaging tools.
+
+Running the server
+==================
+
+Typically, the ZEO server is run using the ``runzeo`` script that's
+installed as part of a ZEO installation.  The ``runzeo`` script
+accepts command line options, the most important of which is the
+``-C`` (``--configuration``) option.  ZEO servers are best configured
+via configuration files.  The ``runzeo`` script also accepts some
+command-line arguments for ad-hoc configurations, but there's an
+easier way to run an ad-hoc server described below.  For more on
+configuraing a ZEO server see `Server configuration`_ below.
+
+Server quick-start/ad-hoc operation
+-----------------------------------
+
+You can quickly start a ZEO server from a Python prompt::
+
+  import ZEO
+  address, stop = ZEO.server()
+
+This runs a ZEO server on a dynamic address and using an in-memory
+storage.
+
+We can then create a ZEO client connection using the address
+returned::
+
+  connection = ZEO.connection(addr)
+
+This is a ZODB connection for a database opened on a client storage
+instance created on the fly.  This is a shorthand for::
+
+  db = ZEO.DB(addr)
+  connection = db.open()
+
+Which is a short-hand for::
+
+  client_storage = ZEO.client(addr)
+
+  import ZODB
+  db = ZODB.db(client_storage)
+  connection = db.open()
+
+If you exit the Python process, the storage exits as well, as it's run
+in an in-process thread.  It will leave behind it's configuration and
+log file.  This provides a handy way to get a configuration example.
+
+You shut down the server more cleanly by calling the stop function
+returned by the ``ZEO.server`` function.  This will remove the
+temporary configuration file.
+
+To have data stored persistently, you can specify a file-storage path
+name using a ``path`` parameter.  If you want blob support, you can
+specify a blob-file directory using the ``blob_dir`` directory.
+
+You can also supply a port to listen on, full storage configuration
+and ZEO server configuration options to the ``ZEO.server``
+function. See it's documentation string for more information.
+
+Server configuration
+--------------------
+
+The script runzeo.py runs the ZEO server.  The server can be
+configured using command-line arguments or a config file.  This
+document only describes the config file.  Run runzeo.py
+-h to see the list of command-line arguments.
+
+The configuration file specifies the underlying storage the server
+uses, the address it binds to, and a few other optional parameters.
+An example is::
+
+    <zeo>
+      address zeo.example.com:8090
+    </zeo>
+
+    <filestorage>
+      path /var/tmp/Data.fs
+    </filestorage>
+
+    <eventlog>
+      <logfile>
+        path /var/tmp/zeo.log
+        format %(asctime)s %(message)s
+      </logfile>
+    </eventlog>
+
+The format is similar to the Apache configuration format.  Individual
+settings have a name, 1 or more spaces and a value, as in::
+
+  address zeo.example.com:8090
+
+Settings are grouped into hierarchical sections.
+
+The example above configures a server to use a file storage from
+``/var/tmp/Data.fs``.  The server listens on port ``8090`` of
+``zeo.example.com``.  The ZEO server writes its log file to
+``/var/tmp/zeo.log`` and uses a custom format for each line.  Assuming the
+example configuration it stored in ``zeo.config``, you can run a server by
+typing::
+
+    runzeo -C zeo.config
+
+A configuration file consists of a <zeo> section and a storage
+section, where the storage section can use any of the valid ZODB
+storage types.  It may also contain an eventlog configuration.  See
+ZODB documentation for information on configuring storages. See
+`Configuring event logs <logs.rst>`_ for information on configuring
+server logs.
+
+An **easy way to get started** with configuration is to run an add-hoc
+server and copy the generated configuration.
+
+The zeo section must list the address.  All the other keys are
+optional.
+
+address
+        The address at which the server should listen.  This can be in
+        the form 'host:port' to signify a TCP/IP connection or a
+        pathname string to signify a Unix domain socket connection (at
+        least one '/' is required).  A hostname may be a DNS name or a
+        dotted IP address.  If the hostname is omitted, the platform's
+        default behavior is used when binding the listening socket (''
+        is passed to socket.bind() as the hostname portion of the
+        address).
+
+read-only
+        Flag indicating whether the server should operate in read-only
+        mode.  Defaults to false.  Note that even if the server is
+        operating in writable mode, individual storages may still be
+        read-only.  But if the server is in read-only mode, no write
+        operations are allowed, even if the storages are writable.  Note
+        that pack() is considered a read-only operation.
+
+invalidation-queue-size
+        The storage server keeps a queue of the objects modified by the
+        last N transactions, where N == invalidation_queue_size.  This
+        queue is used to support client cache verification when a client
+        disconnects for a short period of time.
+
+invalidation-age
+        The maximum age of a client for which quick-verification
+        invalidations will be provided by iterating over the served
+        storage. This option should only be used if the served storage
+        supports efficient iteration from a starting point near the
+        end of the transaction history (e.g. end of file).
+
+transaction-timeout
+        The maximum amount of time, in seconds, to wait for a
+        transaction to commit after acquiring the storage lock,
+        specified in seconds.  If the transaction takes too long, the
+        client connection will be closed and the transaction aborted.
+
+        This defaults to 30 seconds.
+
+SSL configuration
+~~~~~~~~~~~~~~~~~
+
+A server can optionally support SSL.  Do do so, include a `ssl`
+subsection of the ZEO section, as in::
+
+    <zeo>
+      address zeo.example.com:8090
+      <ssl>
+        certificate server_certificate.pem
+        key = server_certificate_key.pem
+      </ssl>
+    </zeo>
+
+    <filestorage>
+      path /var/tmp/Data.fs
+    </filestorage>
+
+    <eventlog>
+      <logfile>
+        path /var/tmp/zeo.log
+        format %(asctime)s %(message)s
+      </logfile>
+    </eventlog>
+
+The ``ssl`` section has settings:
+
+certificate
+        The path to an SSL certificate file for the server.
+
+key
+        The path to the SSL key file for the server certificate.
+
+authenticate
+        The path to a file or directory containing client certificates
+        to authenticate.  ((See the ``cafile`` and ``capath``
+        parameters in the Python documentation for
+        ``ssl.SSLContext.load_verify_locations``.)
+
+        If this setting is used. then certificate authentication is
+        used to authenticate clients.  A client must be configuted
+        with one of the certificates supplied using this setting.
+
+Running the ZEO server as a daemon
+----------------------------------
+
+In an operational setting, you will want to run the ZEO server a
+daemon process that is restarted when it dies.  The zdaemon package
+provides two tools for running daemons: zdrun.py and zdctl.py. You can
+find zdaemon and it's documentation at
+http://pypi.python.org/pypi/zdaemon.
+
+Note that ``runzeo`` makes no attempt to implemnt a well behaved
+daemon. It expects that functionality to be provided by a wrapper like
+zdaemon or supervisord.
+
+Rotating log files
+------------------
+
+``runzeo`` will re-initialize its logging subsystem when it receives a
+SIGUSR2 signal.  If you are using the standard event logger, you
+should first rename the log file and then send the signal to the
+server.  The server will continue writing to the renamed log file
+until it receives the signal.  After it receives the signal, the
+server will create a new file with the old name and write to it.
+
+ZEO Clients
+===========
+
+To use a ZEO server, you need to connect to it using a ZEO client
+storage.  You create client storages either using a Python API or
+using a ZODB storage configuration in a ZODB storage configuration
+section.
+
+Python API for creating a ZEO client storage
+--------------------------------------------
+
+To create a client storage from Python, use the ``ZEO.client``
+function::
+
+    import ZEO
+    client = ZEO.client(8200)
+
+In the example above, we created a client that connected to a storage
+listening on port 8200 on local host.  The first argument is an
+address, or list of addresses to connect to.  There are many additinal
+options, decumented below that should be given as keyword arguments.
+
+Addresses can be:
+
+- A host/port tuple
+
+- An integer, which implies that the host is '127.0.0.1'
+
+- A unix domain socket file name.
+
+Options:
+
+cache_size
+   The cache size in bytes. This defaults to a 20MB.
+
+cache
+   The ZEO cache to be used.  This can be a file name, which will
+   cause a persisetnt standard persistent ZEO cache to be used and
+   stored in the given name.  This can also be an object that
+   implements ``ZEO.interfaces.ICache``.
+
+   If not specified, then a non-persistent cache will be used.
+
+blob_dir
+   The name of a directory to hold/cache blob data downloaded from the
+   server.  This must be provided if blobs are to be used.  (Of
+   course, the server storage must be configured to use blobs as
+   well.)
+
+shared_blob_dir
+   A client can use a network files system (or a local directory if
+   the server runs on the same machine) to share a blob directory with
+   the server.  This allows downloading of blobs (except via a
+   distributed file system) to be avoided.
+
+blob_cache_size
+   The size of the blob cache in bytes.  IF unset, then blobs will
+   accumulate. If set, then blobs are removed when the total size
+   exceeds this amount.  Blobs accessed least recently are removed
+   first.
+
+blob_cache_size_check
+   The total size of data to be downloaded to trigger blob cache size
+   reduction. The defaukt is 10 (percent).  This controls how often to
+   remove blobs from the cache.
+
+ssl
+   An ``ssl.SSLContext`` object used to make SSL connections.
+
+ssl_server_hostname
+   Host name to use for SSL host name checks.
+
+   If using SSL and if host name checking is enabled in the given SSL
+   context then use this as the value to check.  If an address is a
+   host/port pair, then this defaults to the host in the address.
+
+read_only
+   Set to true for a read-only connection.
+
+   If false (the default), then request a read/write connection.
+
+   This option is ignored if ``read_only_fallback`` is set to a true value.
+
+read_only_fallback
+   Set to true, then prefer a read/write connection, but be willing to
+   use a read-only connection.  This defaults to a false value.
+
+   If ``read_only_fallback`` is set, then ``read_only`` is ignored.
+
+wait_timeout
+   How long to wait for an initial connection, defaulting to 30
+   seconds.  If an initial connection can't be made within this time
+   limit, then creation of the client storage will fail with a
+   ``ZEO.Exceptions.ClientDisconnected`` exception.
+
+   After the initial connection, if the client is disconnected:
+
+   - In-flight server requests will fail with a
+     ``ZEO.Exceptions.ClientDisconnected`` exception.
+
+   - New requests will block for up to ``wait_timeout`` waiting for a
+     connection to be established before failing with a
+     ``ZEO.Exceptions.ClientDisconnected`` exception.
+
+client_label
+   A short string to display in *server* logs for an event relating to
+   this client. This can be helpful when debugging.
+
+disconnect_poll
+   The delay in seconds between attempts to connect to the
+   server, in seconds.  Defaults to 1 second.
+
+Configuration strings/files
+---------------------------
+
+ZODB databases and storages can be configured using configuration
+files, or strings (extracted from configuration files).  They use the
+same syntax as the server configuration files described above, but
+with different sections and options.
+
+An application that used ZODB might configure it's database using a
+string like::
+
+  <zodb>
+     cache-size-bytes 1000MB
+
+     <filestorage>
+       path /var/lib/Data.fs
+     </filestorage>
+  </zodb>
+
+In this example, we configured a ZODB database with a object cache
+size of 1GB.  Inside the database, we configured a file storage.  The
+``filestorage`` section provided file-storage parameters.  We saw a
+similar section in the storage-server configuration example in `Server
+configuration`_.
+
+To configure a client storage, you use a ``clientstorage`` section,
+but first you have to import it's definition, because ZEO isn't built
+into ZODB.  Here's an example::
+
+  <zodb>
+     cache-size-bytes 1000MB
+
+     %import ZEO
+
+     <clientstorage>
+       server 8200
+     </clientstorage>
+  </zodb>
+
+In this example, we defined a client storage that connected to a
+server on port 8200.
+
+The following settings are supported:
+
+cache-size
+   The cache size in bytes, KB or MB. This defaults to a 20MB.
+   Optional ``KB`` or ``MB`` suffixes can (and usually are) used to
+   specify units other than bytes.
+
+cache-path
+   The file path of a persistent cache file
+
+blob-dir
+   The name of a directory to hold/cache blob data downloaded from the
+   server.  This must be provided if blobs are to be used.  (Of
+   course, the server storage must be configured to use blobs as
+   well.)
+
+shared-blob-dir
+   A client can use a network files system (or a local directory if
+   the server runs on the same machine) to share a blob directory with
+   the server.  This allows downloading of blobs (except via a
+   distributed file system) to be avoided.
+
+blob-cache-size
+   The size of the blob cache in bytes.  IF unset, then blobs will
+   accumulate. If set, then blobs are removed when the total size
+   exceeds this amount.  Blobs accessed least recently are removed
+   first.
+
+blob-cache-size-check
+   The total size of data to be downloaded to trigger blob cache size
+   reduction. The defaukt is 10 (percent).  This controls how often to
+   remove blobs from the cache.
+
+ssl-certificate
+   The full path to an SSL certificate file. For the client. This
+   is needed if the server requires client authentication.
+
+ssl-key
+   The full path to an SSL key file for the client certificate.
+
+ssl-server-certificates
+   The full path to a file containing server certificates to be
+   authenticated.
+
+ssl-server-certificate-directory
+   The full path to a directory containing server certificates to be
+   authenticated.
+
+ssl-check-hostname
+   Set this to true to require checking of the server's host name.
+
+ssl-server-hostname
+   Host name to use for SSL host name checks.
+
+   If using SSL and ``ssl-check-hostname`` then use this as the
+   value to check.  If an address is a host/port pair, then this
+   defaults to the host in the address.
+
+read-only
+   Set to true for a read-only connection.
+
+   If false (the default), then request a read/write connection.
+
+   This option is ignored if ``read_only_fallback`` is set to a true value.
+
+read-only-fallback
+   Set to true, then prefer a read/write connection, but be willing to
+   use a read-only connection.  This defaults to a false value.
+
+   If ``read_only_fallback`` is set, then ``read_only`` is ignored.
+
+wait_timeout
+   How long to wait for an initial connection, defaulting to 30
+   seconds.  If an initial connection can't be made within this time
+   limit, then creation of the client storage will fail with a
+   ``ZEO.Exceptions.ClientDisconnected`` exception.
+
+   After the initial connection, if the client is disconnected:
+
+   - In-flight server requests will fail with a
+     ``ZEO.Exceptions.ClientDisconnected`` exception.
+
+   - New requests will block for up to ``wait_timeout`` waiting for a
+     connection to be established before failing with a
+     ``ZEO.Exceptions.ClientDisconnected`` exception.
+
+client_label
+   A short string to display in *server* logs for an event relating to
+   this client. This can be helpful when debugging.
+
+disconnect_poll
+   The delay in seconds between attempts to connect to the
+   server, in seconds.  Defaults to 1 second.
+
+SSL configuration
+~~~~~~~~~~~~~~~~~
+
+An ``ssl`` subsection can be used to enable and configure SSL, as in::
+
+  <clientstorage>
+    server 8200
+    <ssl>
+    </ssl>
+  </clientstorage>
+
+In the example above, SSL is enabled in it's simplest form:
+
+- No authentication certificate is supplied
+
+- The server isn't authenticated against certificates.
+
+- The server's host name isn't checked.
+
+A number of settings can be provided to configure SSL:
+
+certificate
+        The path to an SSL certificate file for the client.  This is
+        needed to allow the server to authenticate the client.
+
+key
+        The path to the SSL key file for the client certificate.
+
+authenticate
+        The path to a file or directory containing server certificates
+        to authenticate.  ((See the ``cafile`` and ``capath``
+        parameters in the Python documentation for
+        ``ssl.SSLContext.load_verify_locations``.)
+
+        If this setting is used. then certificate authentication is
+        used to authenticate server.  The server must be configuted
+        with one of the certificates supplied using this setting.
+
+        If this setting has the value ``SIGNED`` or ``-``, then
+        default certificate authoritied from the client host system
+        will be used.
+
+check-hostname
+        This is a boolean setting that defaults to false. Verify the
+        host name in the server certificate is as expected.
+
+server-hostname
+        The expected server host name.  This defaults to the host name
+        used in the server address.  This option must be used when
+        ``verify-host`` is true and when a server address has no host
+        name (localhost, or unix domain socket) or when there is more
+        than one seerver and server hostnames differ.
+
+        Using this setting implies the ``verify-host`` setting.