Commit 9372d2c3 authored by David Wilson's avatar David Wilson

docs: Fix up tons of references, document trust chain

parent 05128543
......@@ -32,21 +32,22 @@ bootstrap implementation sent to every new slave context.
Decorator that marks a function or class method to automatically receive a
kwarg named `econtext`, referencing the
:py:class:`econtext.core.ExternalContext` active in the context in which
the function is being invoked in. The decorator is only meaningful when the
function is invoked via :py:data:`econtext.core.CALL_FUNCTION`.
:py:class:`mitogen.core.ExternalContext` active in the context in which the
function is being invoked in. The decorator is only meaningful when the
function is invoked via :py:data:`CALL_FUNCTION
<mitogen.core.CALL_FUNCTION>`.
When the function is invoked directly, `econtext` must still be passed to it
explicitly.
When the function is invoked directly, `econtext` must still be passed to
it explicitly.
.. currentmodule:: mitogen.core
.. decorator:: takes_router
Decorator that marks a function or class method to automatically receive a
kwarg named `router`, referencing the :py:class:`econtext.core.Router`
kwarg named `router`, referencing the :py:class:`mitogen.core.Router`
active in the context in which the function is being invoked in. The
decorator is only meaningful when the function is invoked via
:py:data:`econtext.core.CALL_FUNCTION`.
:py:data:`CALL_FUNCTION <mitogen.core.CALL_FUNCTION>`.
When the function is invoked directly, `router` must still be passed to it
explicitly.
......@@ -269,8 +270,9 @@ Router Class
.. class:: Router
Route messages between parent and child contexts, and invoke handlers
defined on our parent context. Router.route() straddles the Broker and user
threads, it is safe to call anywhere.
defined on our parent context. :py:meth:`Router.route() <route>` straddles
the :py:class:`Broker <mitogen.core.Broker>` and user threads, it is safe
to call anywhere.
**Note:** This is the somewhat limited core version of the Router class
used by child contexts. The master subclass is documented below this one.
......@@ -312,8 +314,8 @@ Router Class
:param mitogen.core.Context respondent:
Context that messages to this handle are expected to be sent from.
If specified, arranges for ``_DEAD`` to be delivered to `fn` when
disconncetion of the context is detected.
If specified, arranges for :py:data:`_DEAD` to be delivered to `fn`
when disconnection of the context is detected.
In future `respondent` will likely also be used to prevent other
contexts from sending messages to the handle.
......@@ -635,9 +637,9 @@ Receiver Class
:param mitogen.core.Context respondent:
Reference to the context this receiver is receiving from. If not
``None``, arranges for the receiver to receive
:py:data:`mitogen.core._DEAD` if messages can no longer be routed to
the context, due to disconnection or exit.
``None``, arranges for the receiver to receive :py:data:`_DEAD` if
messages can no longer be routed to the context, due to disconnection
or exit.
.. attribute:: notify = None
......@@ -718,7 +720,7 @@ Sender Class
.. py:method:: close ()
Send :py:data:`mitogen.core._DEAD` to the remote end, causing
Send :py:data:`_DEAD` to the remote end, causing
:py:meth:`ChannelError` to be raised in any waiting thread.
.. py:method:: put (data)
......@@ -745,12 +747,15 @@ Channel Class
Broker Class
============
.. currentmodule:: mitogen.master
.. currentmodule:: mitogen.core
.. autoclass:: Broker
:members:
:inherited-members:
.. currentmodule:: mitogen.master
.. autoclass:: Broker
:members:
Utility Functions
=================
......
......@@ -198,6 +198,8 @@ active, so that ``print`` statements and suchlike promptly appear in the logs.
Function Call Dispatch
######################
.. currentmodule:: mitogen.core
After all initialization is complete, the child's main thread sits in a loop
reading from a :py:class:`Channel <mitogen.core.Channel>` connected to the
:py:data:`CALL_FUNCTION <mitogen.core.CALL_FUNCTION>` handle. This handle is
......@@ -205,17 +207,26 @@ written to by
:py:meth:`call() <mitogen.master.Context.call>`
and :py:meth:`call_async() <mitogen.master.Context.call_async>`.
:py:data:`CALL_FUNCTION <mitogen.core.CALL_FUNCTION>` only accepts requests
from the context IDs listed in :py:data:`mitogen.parent_ids`, forming a chain
of trust between the master and any intermediate context leading to the
recipient of the message. In combination with :ref:`source-verification`, this
is a major contributor to ensuring contexts running on compromised
infrastructure cannot trigger code execution in siblings or any parent.
Shutdown
########
.. currentmodule:: mitogen.core
When a context receives :py:data:`SHUTDOWN <mitogen.core.SHUTDOWN>` from its
immediate parent, it closes its own :py:data:`CALL_FUNCTION
<mitogen.core.CALL_FUNCTION>` :py:class:`Channel <mitogen.core.Channel>` before
sending ``SHUTDOWN`` to any directly connected children. Closing the channel
has the effect of causing :py:meth:`ExternalContext._dispatch_calls()
<mitogen.core.ExternalContext._dispatch_calls>` to exit and begin joining on
the broker thread.
sending :py:data:`SHUTDOWN <mitogen.core.SHUTDOWN>` to any directly connected
children. Closing the channel has the effect of causing
:py:meth:`ExternalContext._dispatch_calls` to exit and begin joining on the
broker thread.
During shutdown, the master waits up to 5 seconds for children to disconnect
gracefully before force disconnecting them, while children will use that time
......@@ -234,7 +245,7 @@ irritating delays would often be experienced during program termination.
If the main thread (responsible for function call dispatch) fails to shut down
gracefully, because some user function is hanging, it will still be cleaned up
since as the final step in broker shutdown, the broker sends
:py:data:`signal.SIGTERM` to its own process.
:py:mod:`signal.SIGTERM <signal>` to its own process.
.. _stream-protocol:
......@@ -242,6 +253,8 @@ since as the final step in broker shutdown, the broker sends
Stream Protocol
---------------
.. currentmodule:: mitogen.core
Once connected, a basic framing protocol is used to communicate between
parent and child:
......@@ -271,19 +284,21 @@ Masters listen on the following handles:
.. data:: mitogen.core.GET_MODULE
Receives the name of a module to load `fullname`, locates the source code
for ``fullname``, and routes one or more ``LOAD_MODULE`` messages back
towards the sender of the ``GET_MODULE`` request. See below for a longer
discussion of ``GET_MODULE``/``LOAD_MODULE``.
for ``fullname``, and routes one or more :py:data:`LOAD_MODULE` messages
back towards the sender of the :py:data:`GET_MODULE` request. If lookup
fails, ``None`` is sent instead.
See :ref:`import-preloading` for a deeper discussion of
:py:data:`GET_MODULE`/:py:data:`LOAD_MODULE`.
If lookup fails, ``None`` is sent instead.
.. data:: mitogen.core.ALLOCATE_ID
Replies to any message sent to it with a newly allocated unique context ID,
to allow children to safely start their own contexts. In future this is
likely to be replaced by 32-bit context IDs and pseudorandom allocation,
with an improved ``ADD_ROUTE`` message sent upstream rather than downstream
that generates NACKs if any ancestor detects an ID collision.
with an improved :py:data:`ADD_ROUTE` message sent upstream rather than
downstream that generates NACKs if any ancestor detects an ID collision.
Children listen on the following handles:
......@@ -295,16 +310,16 @@ Children listen on the following handles:
* **pkg_present**: Either ``None`` for a plain ``.py`` module, or a list of
canonical names of submodules existing witin this package. For example, a
``LOAD_MODULE`` for the ``mitogen`` package would return a list like:
`["mitogen.core", "mitogen.fakessh", "mitogen.fakessh", ..]`. This list
is used by children to avoid generating useless round-trips due to Python
2.x's ``import`` statement behavior.
:py:data:`LOAD_MODULE` for the :py:mod:`mitogen` package would return a
list like: `["mitogen.core", "mitogen.fakessh", "mitogen.master", ..]`.
This list is used by children to avoid generating useless round-trips due
to Python 2.x's ``import`` statement behavior.
* **path**: Original filesystem where the module was found on the master.
* **compressed**: :py:mod:`zlib`-compressed module source code.
* **related**: list of canonical module names on which this module appears
to depend. Used by children that have ever started any children of their
own to preload those children with ``LOAD_MODULE`` messages in response
to a ``GET_MODULE`` request.
own to preload those children with :py:data:`LOAD_MODULE` messages in
response to a :py:data:`GET_MODULE` request.
.. _CALL_FUNCTION:
.. data:: mitogen.core.CALL_FUNCTION
......@@ -315,53 +330,53 @@ Children listen on the following handles:
imports ``mod_name``, then attempts to execute
`class_name.func_name(\*args, \**kwargs)`.
When this channel is closed (by way of sending ``_DEAD`` to it), the
child's main thread begins graceful shutdown of its own `Broker` and
`Router`.
When this channel is closed (by way of sending :py:data:`_DEAD` to it), the
child's main thread begins graceful shutdown of its own :py:class:`Broker`
and :py:class:`Router`.
.. data:: mitogen.core.SHUTDOWN
When received from a child's immediate parent, causes the broker thread to
enter graceful shutdown, including writing ``_DEAD`` to the child's main
thread, causing it to join on the exit of the broker thread.
enter graceful shutdown, including writing :py:data:`_DEAD` to the child's
main thread, causing it to join on the exit of the broker thread.
The final step of a child's broker shutdown process sends
:py:data:`signal.SIGTERM` to itself, ensuring the process dies even if the
main thread was hung executing user code.
:py:mod:`signal.SIGTERM <signal>` to itself, ensuring the process dies even
if the main thread was hung executing user code.
Each context is responsible for sending ``SHUTDOWN`` to each of its
directly connected children in response to the master sending ``SHUTDOWN``
to it, and arranging for the connection to its parent to be closed shortly
thereafter.
Each context is responsible for sending :py:data:`SHUTDOWN` to each of its
directly connected children in response to the master sending
:py:data:`SHUTDOWN` to it, and arranging for the connection to its parent
to be closed shortly thereafter.
.. _ADD_ROUTE:
.. data:: mitogen.core.ADD_ROUTE
Receives `(target_id, via_id)` integer tuples, describing how messages
arriving at this context on any Stream should be forwarded on the stream
associated with the Context `via_id` such that they are eventually
delivered to the target Context.
arriving at this context on any stream should be forwarded on the stream
associated with the context `via_id` such that they are eventually
delivered to the target context.
This message is necessary to inform intermediary contexts of the existence
of a downstream Context, as they do not otherwise parse traffic they are
fowarding to their downstream contexts that may cause new contexts to be
established.
Given a chain `master -> ssh1 -> sudo1`, no `ADD_ROUTE` message is
Given a chain `master -> ssh1 -> sudo1`, no :py:data:`ADD_ROUTE` message is
necessary, since :py:class:`mitogen.core.Router` in the `ssh` context can
arrange to update its routes while setting up the new child during
`proxy_connect()`.
:py:meth:`Router.proxy_connect() <mitogen.master.Router.proxy_connect>`.
However, given a chain like `master -> ssh1 -> sudo1 -> ssh2 -> sudo2`,
`ssh1` requires an `ADD_ROUTE` for `ssh2`, and both `ssh1` and `sudo1`
require an `ADD_ROUTE` for `sudo2`, as neither directly dealt with its
establishment.
`ssh1` requires an :py:data:`ADD_ROUTE` for `ssh2`, and both `ssh1` and
`sudo1` require an :py:data:`ADD_ROUTE` for `sudo2`, as neither directly
dealt with its establishment.
Children that have ever been used to create a descendent child also listen on
the following handles:
.. data:: mitogen.core.GET_MODULE
.. data:: GET_MODULE
As with master's ``GET_MODULE``, except this implementation
(:py:class:`mitogen.master.ModuleForwarder`) serves responses using
......@@ -434,16 +449,18 @@ communicate with.
When :py:class:`mitogen.core.Router` receives a message, it checks the IDs
associated with its directly connected streams for a potential route. If any
stream matches, either because it directly connects to the target ID, or
because the master sent an ``ADD_ROUTE`` message associating it, then the
message will be forwarded down the tree using that stream.
because the master sent an :py:data:`ADD_ROUTE <mitogen.core.ADD_ROUTE>`
message associating it, then the message will be forwarded down the tree using
that stream.
If the message does not match any ``ADD_ROUTE`` message or stream, instead it
is forwarded upwards to the immediate parent, and recursively by each parent in
turn until one is reached that knows how to forward the message down the tree.
If the message does not match any :py:data:`ADD_ROUTE <mitogen.core.ADD_ROUTE>`
message or stream, instead it is forwarded upwards to the immediate parent, and
recursively by each parent in turn until one is reached that knows how to
forward the message down the tree.
When the master establishes a new context via an existing child context, it
sends corresponding ``ADD_ROUTE`` messages to each indirect parent between the
context and the root.
sends corresponding :py:data:`ADD_ROUTE <mitogen.core.ADD_ROUTE>` messages to
each indirect parent between the context and the root.
Example
......@@ -464,6 +481,24 @@ When ``sudo:node22a:webapp`` wants to send a message to
.. image:: images/route.png
.. _source-verification:
Source Verification
###################
Before forwarding or dispatching a message it has received,
:py:class:`mitogen.core.Router` first looks up the corresponding
:py:class:`mitogen.core.Stream` it would use to send responses towards the
message source, and if the looked up stream does not match the stream on which
the message was received, the message is discarded and a warning is logged.
This creates a trust chain leading up to the root of the tree, preventing
downstream contexts from injecting messages appearing to be from the master or
any more trustworthy parent. In this way, privileged functionality such as
:py:data:`CALL_FUNCTION <mitogen.core.CALL_FUNCTION>` can base trust decisions
on the accuracy of :py:ref:`src_id <stream-protocol>`.
Future
######
......@@ -488,24 +523,25 @@ The Module Importer
are a variety of approaches to implementing it, and the present implementation
is not pefectly efficient in every case.
It operates by intercepting ``import`` statements via `sys.meta_path`, asking
Python if it can satisfy the import by itself, and if not, indicating to Python
that it is capable of loading the module.
It operates by intercepting ``import`` statements via :py:data:`sys.meta_path`,
asking Python if it can satisfy the import by itself, and if not, indicating to
Python that it is capable of loading the module.
In :py:meth:`load_module() <mitogen.core.Importer.load_module>` an RPC is
started to the parent context, requesting the module source code by way of a
``GET_MODULE``. If the parent context does not have the module available, it
recursively forwards the request upstream, while avoiding duplicate requests
for the same module from its own threads and any child contexts.
:py:data:`GET_MODULE <mitogen.core.GET_MODULE>`. If the parent context does not
have the module available, it recursively forwards the request upstream, while
avoiding duplicate requests for the same module from its own threads and any
child contexts.
Neutralizing ``__main__``
#########################
Neutralizing :py:mod:`__main__`
###############################
To avoid accidental execution of the ``__main__`` module's code in a slave
context, when serving the source of the main module, Mitogen removes any code
occurring after the first conditional that looks like a standard ``__main__``
execution guard:
To avoid accidental execution of the :py:mod:`__main__` module's code in a
slave context, when serving the source of the main module, Mitogen removes any
code occurring after the first conditional that looks like a standard
:py:mod:`__main__` execution guard:
.. code-block:: python
......@@ -530,8 +566,9 @@ requests will be made for modules that do not exist. For example:
import sys
import os
In Python 2.x, Python will first try to load ``mypkg.sys`` and ``mypkg.os``,
which do not exist, before falling back on :py:mod:`sys` and :py:mod:`os`.
In Python 2.x, Python will first try to load :py:mod:`mypkg.sys` and
:py:mod:`mypkg.os`, which do not exist, before falling back on :py:mod:`sys`
and :py:mod:`os`.
These negative imports present a challenge, as they introduce a large number of
pointless network round-trips. Therefore in addition to the
......@@ -545,9 +582,13 @@ module does not appear in the enumeration of child modules belonging to the
package that was provided by the master.
.. _import-preloading:
Import Preloading
#################
.. currentmodule:: mitogen.core
To further avoid round-trips, when a module or package is requested by a child,
its bytecode is scanned in the master to find all the module's ``import``
statements, and of those, which associated modules appear to have been loaded
......@@ -563,20 +604,23 @@ Before replying to a child's request for a module with dependencies:
* If the request is for a package, any dependent modules used by the package
that appear within the package itself are known to be missing from the child,
since the child requested the top-level package module, therefore they are
pre-loaded into the child using ``LOAD_MODULE`` messages before sending the
``LOAD_MODULE`` message for the requested package module itself. In this way,
the child will already have dependent modules cached by the time it receives
the requested module, avoiding one round-trip for each dependency.
For example, when a child requests the ``django`` package, and the master
determines the ``django`` module code in the master has import statements for
``django.utils``, ``django.utils.lru_cache``, and ``django.utils.version``,
pre-loaded into the child using :py:data:`LOAD_MODULE` messages before
sending the :py:data:`LOAD_MODULE` message for the requested package module
itself. In this way, the child will already have dependent modules cached by
the time it receives the requested module, avoiding one round-trip for each
dependency.
For example, when a child requests the :py:mod:`django` package, and the master
determines the :py:mod:`django` module code in the master has import
statements for :py:mod:`django.utils`, :py:mod:`django.utils.lru_cache`, and
:py:mod:`django.utils.version`,
and that exceution of the module code on the master caused those modules to
appear in the master's :py:data:`sys.modules`, there is high probability
execution of the ``django`` module code in the child will cause the same
modules to be loaded. Since all those modules exist within the ``django``
package, and we already know the child lacks that package, it is safe to
assume the child will make follow-up requests for those modules too.
execution of the :py:mod:`django` module code in the child will cause the
same modules to be loaded. Since all those modules exist within the
:py:mod:`django` package, and we already know the child lacks that package,
it is safe to assume the child will make follow-up requests for those modules
too.
In the example, this replaces 4 round-trips with 1 round-trip.
......@@ -586,16 +630,18 @@ the name of the package for one final optimization:
* If the request is for a sub-module of a package, and it is known the child
loaded the package's implementation from the parent, then any dependent
modules of the requested module at any nesting level within the package that
is known to be missing are sent using ``LOAD_MODULE`` messages before sending
the ``LOAD_MODULE`` message for the requested module, avoiding 1 round-trip
for each dependency within the same top-level package.
For example, when a child has previously requested the ``django`` package,
the parent knows it was completely absent on the child. Therefore when the
child subsequently requests the ``django.db`` module, it is safe to assume
the child will generate subsequent ``GET_MODULE`` requests for the 2
``django.conf``, 3 ``django.core``, 2 ``django.db``, 3 ``django.dispatch``,
and 7 ``django.utils`` indirect dependencies for ``django.db``.
is known to be missing are sent using :py:data:`LOAD_MODULE` messages before
sending the :py:data:`LOAD_MODULE` message for the requested module, avoiding
1 round-trip for each dependency within the same top-level package.
For example, when a child has previously requested the :py:mod:`django`
package module, the parent knows the package was completely absent on the
child. Therefore when the child subsequently requests the
:py:mod:`django.db` package module, it is safe to assume the child will
generate subsequent :py:data:`GET_MODULE` requests for the 2
:py:mod:`django.conf`, 3 :py:mod:`django.core`, 2 :py:mod:`django.db`, 3
:py:mod:`django.dispatch`, and 7 :py:mod:`django.utils` indirect dependencies
for :py:mod:`django.db`.
In the example, this replaces 17 round-trips with 1 round-trip.
......
......@@ -162,6 +162,13 @@ Other Stream Subclasses
:members:
Importer Class
--------------
.. currentmodule:: mitogen.core
.. autoclass:: Importer
:members:
ExternalContext Class
---------------------
......@@ -201,6 +208,9 @@ ExternalContext Class
The :py:class:`IoLogger` connected to ``stderr``.
.. method:: _dispatch_calls
Implementation for the main thread in every child context.
mitogen.master
==============
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment