Merge branch 'master' of github.com:cython/cython-docs

b18cf636 · Robert Bradshaw · cde0b0b7 · cbac7ae5 · b18cf636 · b18cf636
Commit b18cf636 authored Mar 21, 2011 by Robert Bradshaw
8 changed files
--- a/conf.py
+++ b/conf.py
@@ -40,17 +40,19 @@ source_suffix = '.rst'
 # The master toctree document.
 master_doc = 'index'

+exclude_patterns = ['py*', 'build']
+
 # General substitutions.
 project = 'Cython'
-copyright = '2010, Stefan Behnel, Robert Bradshaw, Dag Sverre Seljebotn, Greg Ewing, William Stein, Gabriel Gellner, et al.'
+copyright = '2011, Stefan Behnel, Robert Bradshaw, Dag Sverre Seljebotn, Greg Ewing, William Stein, Gabriel Gellner, et al.'

 # The default replacements for |version| and |release|, also used in various
 # other places throughout the built documents.
 #
 # The short X.Y version.
-version = '0.14'
+version = '0.15'
 # The full version, including alpha/beta/rc tags.
-release = '0.14'
+release = '0.15pre'

 # There are two options for replacing |today|: either, you set today to some
 # non-false value, then it is used:

--- a/src/tutorial/clibraries.rst
+++ b/src/tutorial/clibraries.rst
@@ -238,7 +238,8 @@ flags, such as::
 Once we have compiled the module for the first time, we can now import
 it and instantiate a new Queue::

-    PYTHONPATH=. python -c 'import queue.Queue as Q ; Q()'
+    $ export PYTHONPATH=.
+    $ python -c 'import queue.Queue as Q ; Q()'

 However, this is all our Queue class can do so far, so let's make it
 more usable.

--- a/src/tutorial/readings.rst
+++ b/src/tutorial/readings.rst
@@ -15,12 +15,12 @@ features for managing it.

 Finally, don't hesitate to ask questions (or post reports on
 successes!) on the Cython users mailing list [UserList]_.  The Cython
-developer mailing list, [DevList]_, is also open to everybody.  Feel
-free to use it to report a bug, ask for guidance, if you have time to
-spare to develop Cython, or if you have suggestions for future
-development.
+developer mailing list, [DevList]_, is also open to everybody, but
+focusses on core development issues.  Feel free to use it to report a
+clear bug, to ask for guidance if you have time to spare to develop
+Cython, or if you have suggestions for future development.

-.. [DevList] Cython developer mailing list: http://codespeak.net/mailman/listinfo/cython-dev.
+.. [DevList] Cython developer mailing list: http://mail.python.org/mailman/listinfo/cython-devel
 .. [Seljebotn09] D. S. Seljebotn, Fast numerical computations with Cython,
   Proceedings of the 8th Python in Science Conference, 2009.
 .. [UserList] Cython users mailing list: http://groups.google.com/group/cython-users
--- a/src/tutorial/related_work.rst
+++ b/src/tutorial/related_work.rst
 Related work
 ============

-Pyrex [Pyrex]_ is the compiler project that Cython was originally based on.
-Many features and the major design decisions of the Cython language
-were developed by Greg Ewing as part of that project.  Today, Cython
-supersedes the capabilities of Pyrex by providing a higher
-compatibility with Python code and Python semantics, as well as
-superior optimisations and better integration with scientific Python
-extensions like NumPy.
+Pyrex [Pyrex]_ is the compiler project that Cython was originally
+based on.  Many features and the major design decisions of the Cython
+language were developed by Greg Ewing as part of that project.  Today,
+Cython supersedes the capabilities of Pyrex by providing a
+substantially higher compatibility with Python code and Python
+semantics, as well as superior optimisations and better integration
+with scientific Python extensions like NumPy.

 ctypes [ctypes]_ is a foreign function interface (FFI) for Python.  It
 provides C compatible data types, and allows calling functions in DLLs
@@ -20,23 +20,24 @@ operations must pass through Python code first.  Cython, being a
 compiled language, can avoid much of this overhead by moving more
 functionality and long-running loops into fast C code.

-SWIG [SWIG]_ is a wrapper code generator.  It makes it very easy to parse
-large API definitions in C/C++ header files, and to generate straight
-forward wrapper code for a large set of programming languages.  As
-opposed to Cython, however, it is not a programming language itself.
-Thin wrappers are easy to generate, but the more functionality a
-wrapper needs to provide, the harder it gets to implement it with
-SWIG.  Cython, on the other hand, makes it very easy to write very
-elaborate wrapper code specifically for the Python language.  Also,
-there exists third party code for parsing C header files and using it
-to generate Cython definitions and module skeletons.
+SWIG [SWIG]_ is a wrapper code generator.  It makes it very easy to
+parse large API definitions in C/C++ header files, and to generate
+straight forward wrapper code for a large set of programming
+languages.  As opposed to Cython, however, it is not a programming
+language itself.  Thin wrappers are easy to generate, but the more
+functionality a wrapper needs to provide, the harder it gets to
+implement it with SWIG.  Cython, on the other hand, makes it very easy
+to write very elaborate wrapper code specifically for the Python
+language, and to make it as thin or thick as needed at any given
+place.  Also, there exists third party code for parsing C header files
+and using it to generate Cython definitions and module skeletons.

 ShedSkin [ShedSkin]_ is an experimental Python-to-C++ compiler. It
-uses profiling information and very powerful type inference engine
-to generate a C++ program from (restricted) Python source code. 
-The main drawback is has no support for calling the Python/C API for
-operations it does not support natively, and supports very few of the
-standard Python modules.
+uses a very powerful whole-module type inference engine to generate a
+C++ program from (restricted) Python source code.  The main drawback
+is that it has no support for calling the Python/C API for operations
+it does not support natively, and supports very few of the standard
+Python modules.

 .. [ctypes] http://docs.python.org/library/ctypes.html.
 .. there's also the original ctypes home page: http://python.net/crew/theller/ctypes/

--- a/src/tutorial/strings.rst
+++ b/src/tutorial/strings.rst
@@ -203,18 +203,23 @@ Single bytes and characters
 ---------------------------

 The Python C-API uses the normal C ``char`` type to represent a byte
-value, but it has a special ``Py_UNICODE`` integer type for a Unicode
-code point value, i.e. a single Unicode character.  Since version
-0.13, Cython supports the latter natively, which is either defined as
-an unsigned 2-byte or 4-byte integer, or as ``wchar_t``, depending on
-the platform.  The exact type is a compile time option in the build of
-the CPython interpreter and extension modules inherit this definition
-at C compile time.
-
-In Cython, the ``char`` and ``Py_UNICODE`` types behave differently
-when coercing to Python objects.  Similar to the behaviour of the
-bytes type in Python 3, the ``char`` type coerces to a Python integer
-value by default, so that the following prints 65 and not ``A``::
+value, but it has two special integer types for a Unicode code point
+value, i.e. a single Unicode character: ``Py_UNICODE`` and
+``Py_UCS4``.  Since version 0.13, Cython supports the first natively,
+support for ``Py_UCS4`` is new in Cython 0.15.  ``Py_UNICODE`` is
+either defined as an unsigned 2-byte or 4-byte integer, or as
+``wchar_t``, depending on the platform.  The exact type is a compile
+time option in the build of the CPython interpreter and extension
+modules inherit this definition at C compile time.  The advantage of
+``Py_UCS4`` is that it is guaranteed to be large enough for any
+Unicode code point value, regardless of the platform.  It is defined
+as a 32bit unsigned int or long.
+
+In Cython, the ``char`` type behaves differently from the
+``Py_UNICODE`` and ``Py_UCS4`` types when coercing to Python objects.
+Similar to the behaviour of the bytes type in Python 3, the ``char``
+type coerces to a Python integer value by default, so that the
+following prints 65 and not ``A``::

    # -*- coding: ASCII -*-

@@ -230,31 +235,32 @@ explicitly, and the following will print ``A`` (or ``b'A'`` in Python

 The explicit coercion works for any C integer type.  Values outside of
 the range of a ``char`` or ``unsigned char`` will raise an
-``OverflowError``.  Coercion will also happen automatically when
-assigning to a typed variable, e.g.::
+``OverflowError`` at runtime.  Coercion will also happen automatically
+when assigning to a typed variable, e.g.::

-    cdef bytes py_byte_string = char_val
+    cdef bytes py_byte_string
+    py_byte_string = char_val

-On the other hand, the ``Py_UNICODE`` type is rarely used outside of
-the context of a Python unicode string, so its default behaviour is to
-coerce to a Python unicode object.  The following will therefore print
-the character ``A``::
+On the other hand, the ``Py_UNICODE`` and ``Py_UCS4`` types are rarely
+used outside of the context of a Python unicode string, so their
+default behaviour is to coerce to a Python unicode object.  The
+following will therefore print the character ``A``, as would the same
+code with the ``Py_UNICODE`` type::

-    cdef Py_UNICODE uchar_val = u'A'
+    cdef Py_UCS4 uchar_val = u'A'
    assert uchar_val == 65 # character point value of u'A'
    print( uchar_val )

 Again, explicit casting will allow users to override this behaviour.
 The following will print 65::

-    cdef Py_UNICODE uchar_val = u'A'
-    print( <int>uchar_val )
+    cdef Py_UCS4 uchar_val = u'A'
+    print( <long>uchar_val )

-Note that casting to a C ``int`` (or ``unsigned int``) will do just
-fine on a platform with 32bit or more, as the maximum code point value
-that a Unicode character can have is 1114111 on a 4-byte unicode
-CPython platform ("wide unicode") and 65535 on a narrow (2-byte)
-unicode platform.
+Note that casting to a C ``long`` (or ``unsigned long``) will work
+just fine, as the maximum code point value that a Unicode character
+can have is 1114111 (``0x10FFFF``).  On platforms with 32bit or more,
+``int`` is just as good.


 Narrow Unicode builds
@@ -263,19 +269,19 @@ Narrow Unicode builds
 In narrow Unicode builds of CPython, i.e. builds where
 ``sys.maxunicode`` is 65535 (such as all Windows builds, as opposed to
 1114111 in wide builds), it is still possible to use Unicode character
-code points that do not fit into the two bytes wide ``Py_UNICODE``
-type.  For example, such a CPython build will accept the unicode
-literal ``u'\U00012345'``.  However, the underlying system level
-encoding leaks into Python space in this case, so that the length of
-this literal becomes 2 instead of 1.  This also shows when iterating
-over it or when indexing into it.  The visible substrings are
-``u'\uD808'`` and ``u'\uDF45'`` in this example.  They form a
-so-called surrogate pair that represents the above character.
+code points that do not fit into the 16 bit wide ``Py_UNICODE`` type.
+For example, such a CPython build will accept the unicode literal
+``u'\U00012345'``.  However, the underlying system level encoding
+leaks into Python space in this case, so that the length of this
+literal becomes 2 instead of 1.  This also shows when iterating over
+it or when indexing into it.  The visible substrings are ``u'\uD808'``
+and ``u'\uDF45'`` in this example.  They form a so-called surrogate
+pair that represents the above character.

 For more information on this topic, it is worth reading the `Wikipedia
 article about the UTF-16 encoding`_.

-.. _`Wikipedia article on the UTF-16 encoding`: http://en.wikipedia.org/wiki/UTF-16/UCS-2
+.. _`Wikipedia article about the UTF-16 encoding`: http://en.wikipedia.org/wiki/UTF-16/UCS-2

 The same properties apply to Cython code that gets compiled for a
 narrow CPython runtime environment.  In most cases, e.g. when
@@ -298,6 +304,20 @@ in question.  Looking for substrings works correctly because the two
 code units in the surrogate pair use distinct value ranges, so the
 pair is always identifiable in a sequence of code points.

+As of version 0.15, Cython has extended support for surrogate pairs so
+that you can safely use an ``in`` test to search character values from
+the full ``Py_UCS4`` range even on narrow platforms::
+
+    cdef Py_UCS4 uchar = 0x12345
+    print( uchar in some_unicode_string )
+
+Similarly, it can coerce a one character string with a high Unicode
+code point value to a Py_UCS4 value on both narrow and wide Unicode
+platforms::
+
+    cdef Py_UCS4 uchar = u'\U00012345'
+    assert uchar == 0x12345
+

 Iteration
 ---------
@@ -321,7 +341,7 @@ The same applies to bytes objects::
        if c == 'A': ...

 For unicode objects, Cython will automatically infer the type of the
-loop variable as ``Py_UNICODE``::
+loop variable as ``Py_UCS4``::

    cdef unicode ustring = ...

@@ -335,13 +355,16 @@ value to be a Python object, so Cython may end up generating redundant
 conversion code for the loop variable value inside of the loop.  If
 this leads to a performance degradation for a specific piece of code,
 you can either type the loop variable as a Python object explicitly,
-or assign it to a Python typed temporary variable to enforce one-time
-coercion before running Python operations on it.
+or assign its value to a Python typed variable somewhere inside of the
+loop to enforce one-time coercion before running Python operations on
+it.

-There is also an optimisation for ``in`` tests, so that the following
+There are also optimisations for ``in`` tests, so that the following
 code will run in plain C code, (actually using a switch statement)::

-    cdef Py_UNICODE uchar_val = get_a_unicode_character()
+    cdef Py_UCS4 uchar_val = get_a_unicode_character()
    if uchar_val in u'abcABCxY':
        ...

+Combined with the looping optimisation above, this can result in very
+efficient character switching code, e.g. in unicode parsers.
--- a/src/userguide/early_binding_for_speed.rst
+++ b/src/userguide/early_binding_for_speed.rst
@@ -53,7 +53,7 @@ where calls occur within Cython code. For example:
        def __init__(self, int x0, int y0, int x1, int y1):
            self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
        cdef int _area(self):
-            int area
+            cdef int area
            area = (self.x1 - self.x0) * (self.y1 - self.y0)
            if area < 0:
                area = -area
@@ -88,7 +88,7 @@ overheads. Consider this code:
        def __init__(self, int x0, int y0, int x1, int y1):
            self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
        cpdef int area(self):
-            int area
+            cdef int area
            area = (self.x1 - self.x0) * (self.y1 - self.y0)
            if area < 0:
                area = -area

--- a/src/userguide/limitations.rst
+++ b/src/userguide/limitations.rst
@@ -30,7 +30,7 @@ Other Current Limitations
 ==========================

 * The :func:`globals` builtin returns the last Python callers globals, not the current function's locals. This behavior should not be relied upon, as it will probably change in the future. 
-* The :fun:`locals` builtin can only be used if all local variables can be converted to Python objects, and returns a dict.
+* The :func:`locals` builtin can only be used if all local variables can be converted to Python objects, and returns a dict.
 * Class and function definitions cannot be placed inside control structures.

 Semantic differences between Python and Cython

--- a/src/userguide/pxd_package.rst
+++ b/src/userguide/pxd_package.rst
@@ -15,7 +15,7 @@ called 'foo', and its fully qualified module name is
 'foo.shrubbing'.

 So when Pyrex wants to find out whether there is a `.pxd` file for shrubbing,
-it looks for one corresponding to a module called :module:`foo.shrubbing`. It
+it looks for one corresponding to a module called `foo.shrubbing`. It
 does this by searching the include path for a top-level package directory
 called 'foo' containing a file called 'shrubbing.pxd'.