Commit f15351d9 authored by Florent Xicluna's avatar Florent Xicluna

Merged revisions 78838-78839,78917,78919,78934,78937 via svnmerge from

svn+ssh://pythondev@svn.python.org/python/trunk

........
  r78838 | florent.xicluna | 2010-03-11 15:36:19 +0100 (jeu, 11 mar 2010) | 2 lines

  Issue #6472: The xml.etree package is updated to ElementTree 1.3.  The cElementTree module is updated too.
........
  r78839 | florent.xicluna | 2010-03-11 16:55:11 +0100 (jeu, 11 mar 2010) | 2 lines

  Fix repr of tree Element on windows.
........
  r78917 | florent.xicluna | 2010-03-13 12:18:49 +0100 (sam, 13 mar 2010) | 2 lines

  Move the xml test data to their own directory.
........
  r78919 | florent.xicluna | 2010-03-13 13:41:48 +0100 (sam, 13 mar 2010) | 2 lines

  Do not chdir when running test_xml_etree, and enhance the findfile helper.
........
  r78934 | florent.xicluna | 2010-03-13 18:56:19 +0100 (sam, 13 mar 2010) | 2 lines

  Update some parts of the xml.etree documentation.
........
  r78937 | florent.xicluna | 2010-03-13 21:30:15 +0100 (sam, 13 mar 2010) | 3 lines

  Add the keyword argument "method=None" to the .write() method and the tostring/tostringlist functions.
  Update the function, class and method signatures, according to the new convention.
........
parent 9451a1c6
......@@ -6,9 +6,9 @@
.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
The Element type is a flexible container object, designed to store hierarchical
data structures in memory. The type can be described as a cross between a list
and a dictionary.
The :class:`Element` type is a flexible container object, designed to store
hierarchical data structures in memory. The type can be described as a cross
between a list and a dictionary.
Each element has a number of properties associated with it:
......@@ -23,7 +23,8 @@ Each element has a number of properties associated with it:
* a number of child elements, stored in a Python sequence
To create an element instance, use the Element or SubElement factory functions.
To create an element instance, use the :class:`Element` constructor or the
:func:`SubElement` factory function.
The :class:`ElementTree` class can be used to wrap an element structure, and
convert it from and to XML.
......@@ -31,8 +32,14 @@ convert it from and to XML.
A C implementation of this API is available as :mod:`xml.etree.cElementTree`.
See http://effbot.org/zone/element-index.htm for tutorials and links to other
docs. Fredrik Lundh's page is also the location of the development version of the
xml.etree.ElementTree.
docs. Fredrik Lundh's page is also the location of the development version of
the xml.etree.ElementTree.
.. versionchanged:: 2.7
The ElementTree API is updated to 1.3. For more information, see
`Introducing ElementTree 1.3
<http://effbot.org/zone/elementtree-13-intro.htm>`_.
.. _elementtree-functions:
......@@ -43,16 +50,16 @@ Functions
.. function:: Comment(text=None)
Comment element factory. This factory function creates a special element
that will be serialized as an XML comment. The comment string can be either
an ASCII-only :class:`bytes` object or a :class:`str` object. *text* is a
that will be serialized as an XML comment by the standard serializer. The
comment string can be either a bytestring or a Unicode string. *text* is a
string containing the comment string. Returns an element instance
representing a comment.
.. function:: dump(elem)
Writes an element tree or element structure to sys.stdout. This function should
be used for debugging only.
Writes an element tree or element structure to sys.stdout. This function
should be used for debugging only.
The exact output format is implementation dependent. In this version, it's
written as an ordinary XML file.
......@@ -60,24 +67,20 @@ Functions
*elem* is an element tree or an individual element.
.. function:: Element(tag, attrib={}, **extra)
.. function:: fromstring(text)
Element factory. This function returns an object implementing the standard
Element interface. The exact class or type of that object is implementation
dependent, but it will always be compatible with the _ElementInterface class in
this module.
Parses an XML section from a string constant. Same as XML. *text* is a
string containing XML data. Returns an :class:`Element` instance.
The element name, attribute names, and attribute values can be either an
ASCII-only :class:`bytes` object or a :class:`str` object. *tag* is the
element name. *attrib* is an optional dictionary, containing element
attributes. *extra* contains additional attributes, given as keyword
arguments. Returns an element instance.
.. function:: fromstringlist(sequence, parser=None)
.. function:: fromstring(text)
Parses an XML document from a sequence of string fragments. *sequence* is a
list or other sequence containing XML data fragments. *parser* is an
optional parser instance. If not given, the standard :class:`XMLParser`
parser is used. Returns an :class:`Element` instance.
Parses an XML section from a string constant. Same as XML. *text* is a string
containing XML data. Returns an Element instance.
.. versionadded:: 2.7
.. function:: iselement(element)
......@@ -86,12 +89,14 @@ Functions
element instance. Returns a true value if this is an element object.
.. function:: iterparse(source, events=None)
.. function:: iterparse(source, events=None, parser=None)
Parses an XML section into an element tree incrementally, and reports what's
going on to the user. *source* is a filename or file object containing XML data.
*events* is a list of events to report back. If omitted, only "end" events are
reported. Returns an :term:`iterator` providing ``(event, elem)`` pairs.
going on to the user. *source* is a filename or file object containing XML
data. *events* is a list of events to report back. If omitted, only "end"
events are reported. *parser* is an optional parser instance. If not
given, the standard :class:`XMLParser` parser is used. Returns an
:term:`iterator` providing ``(event, elem)`` pairs.
.. note::
......@@ -107,187 +112,258 @@ Functions
.. function:: parse(source, parser=None)
Parses an XML section into an element tree. *source* is a filename or file
object containing XML data. *parser* is an optional parser instance. If not
given, the standard XMLTreeBuilder parser is used. Returns an ElementTree
instance.
object containing XML data. *parser* is an optional parser instance. If
not given, the standard :class:`XMLParser` parser is used. Returns an
:class:`ElementTree` instance.
.. function:: ProcessingInstruction(target, text=None)
PI element factory. This factory function creates a special element that will
be serialized as an XML processing instruction. *target* is a string containing
the PI target. *text* is a string containing the PI contents, if given. Returns
an element instance, representing a processing instruction.
PI element factory. This factory function creates a special element that
will be serialized as an XML processing instruction. *target* is a string
containing the PI target. *text* is a string containing the PI contents, if
given. Returns an element instance, representing a processing instruction.
.. function:: register_namespace(prefix, uri)
Registers a namespace prefix. The registry is global, and any existing
mapping for either the given prefix or the namespace URI will be removed.
*prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
attributes in this namespace will be serialized with the given prefix, if at
all possible.
.. versionadded:: 2.7
.. function:: SubElement(parent, tag, attrib={}, **extra)
Subelement factory. This function creates an element instance, and appends it
to an existing element.
Subelement factory. This function creates an element instance, and appends
it to an existing element.
The element name, attribute names, and attribute values can be either
bytestrings or Unicode strings. *parent* is the parent element. *tag* is
the subelement name. *attrib* is an optional dictionary, containing element
attributes. *extra* contains additional attributes, given as keyword
arguments. Returns an element instance.
.. function:: tostring(element, encoding=None, method=None)
The element name, attribute names, and attribute values can be an ASCII-only
:class:`bytes` object or a :class:`str` object. *parent* is the parent
element. *tag* is the subelement name. *attrib* is an optional dictionary,
containing element attributes. *extra* contains additional attributes, given
as keyword arguments. Returns an element instance.
Generates a string representation of an XML element, including all
subelements. *element* is an :class:`Element` instance. *encoding* is the
output encoding (default is None). *method* is either ``"xml"``,
``"html"`` or ``"text"`` (default is ``"xml"``). Returns an (optionally)
encoded string containing the XML data.
.. function:: tostring(element, encoding=None)
.. function:: tostringlist(element, encoding=None, method=None)
Generates a string representation of an XML element, including all subelements.
*element* is an Element instance. *encoding* is the output encoding (default is
US-ASCII). Returns an encoded string containing the XML data.
Generates a string representation of an XML element, including all
subelements. *element* is an :class:`Element` instance. *encoding* is the
output encoding (default is None). *method* is either ``"xml"``,
``"html"`` or ``"text"`` (default is ``"xml"``). Returns a sequence object
containing the XML data.
.. versionadded:: 2.7
.. function:: XML(text)
.. function:: XML(text, parser=None)
Parses an XML section from a string constant. This function can be used to
embed "XML literals" in Python code. *text* is a string containing XML data.
Returns an Element instance.
embed "XML literals" in Python code. *text* is a string containing XML
data. *parser* is an optional parser instance. If not given, the standard
:class:`XMLParser` parser is used. Returns an :class:`Element` instance.
.. function:: XMLID(text)
.. function:: XMLID(text, parser=None)
Parses an XML section from a string constant, and also returns a dictionary
which maps from element id:s to elements. *text* is a string containing XML
data. Returns a tuple containing an Element instance and a dictionary.
data. *parser* is an optional parser instance. If not given, the standard
:class:`XMLParser` parser is used. Returns a tuple containing an
:class:`Element` instance and a dictionary.
.. _elementtree-element-objects:
.. _elementtree-element-interface:
Element Objects
---------------
The Element Interface
---------------------
Element objects returned by Element or SubElement have the following methods
and attributes.
.. class:: Element(tag, attrib={}, **extra)
Element class. This class defines the Element interface, and provides a
reference implementation of this interface.
.. attribute:: Element.tag
The element name, attribute names, and attribute values can be either
bytestrings or Unicode strings. *tag* is the element name. *attrib* is
an optional dictionary, containing element attributes. *extra* contains
additional attributes, given as keyword arguments.
A string identifying what kind of data this element represents (the element
type, in other words).
.. attribute:: tag
.. attribute:: Element.text
A string identifying what kind of data this element represents (the
element type, in other words).
The *text* attribute can be used to hold additional data associated with the
element. As the name implies this attribute is usually a string but may be any
application-specific object. If the element is created from an XML file the
attribute will contain any text found between the element tags.
.. attribute:: text
.. attribute:: Element.tail
The *text* attribute can be used to hold additional data associated with
the element. As the name implies this attribute is usually a string but
may be any application-specific object. If the element is created from
an XML file the attribute will contain any text found between the element
tags.
The *tail* attribute can be used to hold additional data associated with the
element. This attribute is usually a string but may be any application-specific
object. If the element is created from an XML file the attribute will contain
any text found after the element's end tag and before the next tag.
.. attribute:: tail
.. attribute:: Element.attrib
The *tail* attribute can be used to hold additional data associated with
the element. This attribute is usually a string but may be any
application-specific object. If the element is created from an XML file
the attribute will contain any text found after the element's end tag and
before the next tag.
A dictionary containing the element's attributes. Note that while the *attrib*
value is always a real mutable Python dictionary, an ElementTree implementation
may choose to use another internal representation, and create the dictionary
only if someone asks for it. To take advantage of such implementations, use the
dictionary methods below whenever possible.
The following dictionary-like methods work on the element attributes.
.. attribute:: attrib
A dictionary containing the element's attributes. Note that while the
*attrib* value is always a real mutable Python dictionary, an ElementTree
implementation may choose to use another internal representation, and
create the dictionary only if someone asks for it. To take advantage of
such implementations, use the dictionary methods below whenever possible.
.. method:: Element.clear()
The following dictionary-like methods work on the element attributes.
.. method:: clear()
Resets an element. This function removes all subelements, clears all
attributes, and sets the text and tail attributes to None.
.. method:: Element.get(key, default=None)
.. method:: get(key, default=None)
Gets the element attribute named *key*.
Returns the attribute value, or *default* if the attribute was not found.
.. method:: Element.items()
.. method:: items()
Returns the element attributes as a sequence of (name, value) pairs. The
attributes are returned in an arbitrary order.
.. method:: Element.keys()
.. method:: keys()
Returns the elements attribute names as a list. The names are returned in an
arbitrary order.
Returns the elements attribute names as a list. The names are returned
in an arbitrary order.
.. method:: Element.set(key, value)
.. method:: set(key, value)
Set the attribute *key* on the element to *value*.
The following methods work on the element's children (subelements).
The following methods work on the element's children (subelements).
.. method:: Element.append(subelement)
.. method:: append(subelement)
Adds the element *subelement* to the end of this elements internal list of
subelements.
Adds the element *subelement* to the end of this elements internal list
of subelements.
.. method:: Element.find(match)
.. method:: extend(subelements)
Finds the first subelement matching *match*. *match* may be a tag name or path.
Returns an element instance or ``None``.
Appends *subelements* from a sequence object with zero or more elements.
Raises :exc:`AssertionError` if a subelement is not a valid object.
.. versionadded:: 2.7
.. method:: Element.findall(match)
Finds all subelements matching *match*. *match* may be a tag name or path.
Returns an iterable yielding all matching elements in document order.
.. method:: find(match)
Finds the first subelement matching *match*. *match* may be a tag name
or path. Returns an element instance or ``None``.
.. method:: Element.findtext(condition, default=None)
Finds text for the first subelement matching *condition*. *condition* may be a
tag name or path. Returns the text content of the first matching element, or
*default* if no element was found. Note that if the matching element has no
text content an empty string is returned.
.. method:: findall(match)
Finds all matching subelements, by tag name or path. Returns a list
containing all matching elements in document order.
.. method:: Element.getchildren()
Returns all subelements. The elements are returned in document order.
.. method:: findtext(match, default=None)
Finds text for the first subelement matching *match*. *match* may be
a tag name or path. Returns the text content of the first matching
element, or *default* if no element was found. Note that if the matching
element has no text content an empty string is returned.
.. method:: Element.getiterator(tag=None)
Creates a tree iterator with the current element as the root. The iterator
iterates over this element and all elements below it, in document (depth first)
order. If *tag* is not ``None`` or ``'*'``, only elements whose tag equals
*tag* are returned from the iterator.
.. method:: getchildren()
.. deprecated:: 2.7
Use ``list(elem)`` or iteration.
.. method:: Element.insert(index, element)
.. method:: getiterator(tag=None)
.. deprecated:: 2.7
Use method :meth:`Element.iter` instead.
.. method:: insert(index, element)
Inserts a subelement at the given position in this element.
.. method:: Element.makeelement(tag, attrib)
.. method:: iter(tag=None)
Creates a tree :term:`iterator` with the current element as the root.
The iterator iterates over this element and all elements below it, in
document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
elements whose tag equals *tag* are returned from the iterator. If the
tree structure is modified during iteration, the result is undefined.
Creates a new element object of the same type as this element. Do not call this
method, use the SubElement factory function instead.
.. method:: iterfind(match)
Finds all matching subelements, by tag name or path. Returns an iterable
yielding all matching elements in document order.
.. method:: Element.remove(subelement)
.. versionadded:: 2.7
Removes *subelement* from the element. Unlike the findXYZ methods this method
compares elements based on the instance identity, not on tag value or contents.
Element objects also support the following sequence type methods for working
with subelements: :meth:`__delitem__`, :meth:`__getitem__`, :meth:`__setitem__`,
:meth:`__len__`.
.. method:: itertext()
Caution: Because Element objects do not define a :meth:`__bool__` method,
elements with no subelements will test as ``False``. ::
Creates a text iterator. The iterator loops over this element and all
subelements, in document order, and returns all inner text.
.. versionadded:: 2.7
.. method:: makeelement(tag, attrib)
Creates a new element object of the same type as this element. Do not
call this method, use the :func:`SubElement` factory function instead.
.. method:: remove(subelement)
Removes *subelement* from the element. Unlike the find\* methods this
method compares elements based on the instance identity, not on tag value
or contents.
:class:`Element` objects also support the following sequence type methods
for working with subelements: :meth:`__delitem__`, :meth:`__getitem__`,
:meth:`__setitem__`, :meth:`__len__`.
Caution: Elements with no subelements will test as ``False``. This behavior
will change in future versions. Use specific ``len(elem)`` or ``elem is
None`` test instead. ::
element = root.find('foo')
......@@ -306,11 +382,12 @@ ElementTree Objects
.. class:: ElementTree(element=None, file=None)
ElementTree wrapper class. This class represents an entire element hierarchy,
and adds some extra support for serialization to and from standard XML.
ElementTree wrapper class. This class represents an entire element
hierarchy, and adds some extra support for serialization to and from
standard XML.
*element* is the root element. The tree is initialized with the contents of the
XML *file* if given.
*element* is the root element. The tree is initialized with the contents
of the XML *file* if given.
.. method:: _setroot(element)
......@@ -320,56 +397,73 @@ ElementTree Objects
care. *element* is an element instance.
.. method:: find(path)
.. method:: find(match)
Finds the first toplevel element with given tag. Same as
getroot().find(path). *path* is the element to look for. Returns the
first matching element, or ``None`` if no element was found.
Finds the first toplevel element matching *match*. *match* may be a tag
name or path. Same as getroot().find(match). Returns the first matching
element, or ``None`` if no element was found.
.. method:: findall(path)
.. method:: findall(match)
Finds all toplevel elements with the given tag. Same as
getroot().findall(path). *path* is the element to look for. Returns a
list or :term:`iterator` containing all matching elements, in document
order.
Finds all matching subelements, by tag name or path. Same as
getroot().findall(match). *match* may be a tag name or path. Returns a
list containing all matching elements, in document order.
.. method:: findtext(path, default=None)
.. method:: findtext(match, default=None)
Finds the element text for the first toplevel element with given tag.
Same as getroot().findtext(path). *path* is the toplevel element to look
for. *default* is the value to return if the element was not
found. Returns the text content of the first matching element, or the
default value no element was found. Note that if the element has is
found, but has no text content, this method returns an empty string.
Same as getroot().findtext(match). *match* may be a tag name or path.
*default* is the value to return if the element was not found. Returns
the text content of the first matching element, or the default value no
element was found. Note that if the element is found, but has no text
content, this method returns an empty string.
.. method:: getiterator(tag=None)
.. deprecated:: 2.7
Use method :meth:`ElementTree.iter` instead.
.. method:: getroot()
Returns the root element for this tree.
.. method:: iter(tag=None)
Creates and returns a tree iterator for the root element. The iterator
loops over all elements in this tree, in section order. *tag* is the tag
to look for (default is to return all elements)
.. method:: getroot()
.. method:: iterfind(match)
Returns the root element for this tree.
Finds all matching subelements, by tag name or path. Same as
getroot().iterfind(match). Returns an iterable yielding all matching
elements in document order.
.. versionadded:: 2.7
.. method:: parse(source, parser=None)
Loads an external XML section into this element tree. *source* is a file
name or file object. *parser* is an optional parser instance. If not
given, the standard XMLTreeBuilder parser is used. Returns the section
given, the standard XMLParser parser is used. Returns the section
root element.
.. method:: write(file, encoding=None)
.. method:: write(file, encoding=None, xml_declaration=None, method=None)
Writes the element tree to a file, as XML. *file* is a file name, or a
file object opened for writing. *encoding* [1]_ is the output encoding
(default is US-ASCII).
(default is None). *xml_declaration* controls if an XML declaration
should be added to the file. Use False for never, True for always, None
for only if not US-ASCII or UTF-8 (default is None). *method* is either
``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an
(optionally) encoded string.
This is the XML file that is going to be manipulated::
......@@ -388,13 +482,13 @@ Example of changing the attribute "target" of every link in first paragraph::
>>> from xml.etree.ElementTree import ElementTree
>>> tree = ElementTree()
>>> tree.parse("index.xhtml")
<Element html at b7d3f1ec>
<Element 'html' at 0xb77e6fac>
>>> p = tree.find("body/p") # Finds first occurrence of tag p in body
>>> p
<Element p at 8416e0c>
>>> links = p.getiterator("a") # Returns list of all links
<Element 'p' at 0xb77ec26c>
>>> links = list(p.iter("a")) # Returns list of all links
>>> links
[<Element a at b7d4f9ec>, <Element a at b7d4fb0c>]
[<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
>>> for i in links: # Iterates through all found links
... i.attrib["target"] = "blank"
>>> tree.write("output.xhtml")
......@@ -407,12 +501,12 @@ QName Objects
.. class:: QName(text_or_uri, tag=None)
QName wrapper. This can be used to wrap a QName attribute value, in order to
get proper namespace handling on output. *text_or_uri* is a string containing
the QName value, in the form {uri}local, or, if the tag argument is given, the
URI part of a QName. If *tag* is given, the first argument is interpreted as an
URI, and this argument is interpreted as a local name. :class:`QName` instances
are opaque.
QName wrapper. This can be used to wrap a QName attribute value, in order
to get proper namespace handling on output. *text_or_uri* is a string
containing the QName value, in the form {uri}local, or, if the tag argument
is given, the URI part of a QName. If *tag* is given, the first argument is
interpreted as an URI, and this argument is interpreted as a local name.
:class:`QName` instances are opaque.
.. _elementtree-treebuilder-objects:
......@@ -423,29 +517,29 @@ TreeBuilder Objects
.. class:: TreeBuilder(element_factory=None)
Generic element structure builder. This builder converts a sequence of start,
data, and end method calls to a well-formed element structure. You can use this
class to build an element structure using a custom XML parser, or a parser for
some other XML-like format. The *element_factory* is called to create new
Element instances when given.
Generic element structure builder. This builder converts a sequence of
start, data, and end method calls to a well-formed element structure. You
can use this class to build an element structure using a custom XML parser,
or a parser for some other XML-like format. The *element_factory* is called
to create new :class:`Element` instances when given.
.. method:: close()
Flushes the parser buffers, and returns the toplevel document
element. Returns an Element instance.
Flushes the builder buffers, and returns the toplevel document
element. Returns an :class:`Element` instance.
.. method:: data(data)
Adds text to the current element. *data* is a string. This should be
either an ASCII-only :class:`bytes` object or a :class:`str` object.
either a bytestring, or a Unicode string.
.. method:: end(tag)
Closes the current element. *tag* is the element name. Returns the closed
element.
Closes the current element. *tag* is the element name. Returns the
closed element.
.. method:: start(tag, attrs)
......@@ -454,18 +548,32 @@ TreeBuilder Objects
containing element attributes. Returns the opened element.
.. _elementtree-xmltreebuilder-objects:
In addition, a custom :class:`TreeBuilder` object can provide the
following method:
XMLTreeBuilder Objects
----------------------
.. method:: doctype(name, pubid, system)
Handles a doctype declaration. *name* is the doctype name. *pubid* is
the public identifier. *system* is the system identifier. This method
does not exist on the default :class:`TreeBuilder` class.
.. class:: XMLTreeBuilder(html=0, target=None)
.. versionadded:: 2.7
Element structure builder for XML source data, based on the expat parser. *html*
are predefined HTML entities. This flag is not supported by the current
implementation. *target* is the target object. If omitted, the builder uses an
instance of the standard TreeBuilder class.
.. _elementtree-xmlparser-objects:
XMLParser Objects
-----------------
.. class:: XMLParser(html=0, target=None, encoding=None)
:class:`Element` structure builder for XML source data, based on the expat
parser. *html* are predefined HTML entities. This flag is not supported by
the current implementation. *target* is the target object. If omitted, the
builder uses an instance of the standard TreeBuilder class. *encoding* [1]_
is optional. If given, the value overrides the encoding specified in the
XML file.
.. method:: close()
......@@ -475,22 +583,23 @@ XMLTreeBuilder Objects
.. method:: doctype(name, pubid, system)
Handles a doctype declaration. *name* is the doctype name. *pubid* is the
public identifier. *system* is the system identifier.
.. deprecated:: 2.7
Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
target.
.. method:: feed(data)
Feeds data to the parser. *data* is encoded data.
:meth:`XMLTreeBuilder.feed` calls *target*\'s :meth:`start` method
:meth:`XMLParser.feed` calls *target*\'s :meth:`start` method
for each opening tag, its :meth:`end` method for each closing tag,
and data is processed by method :meth:`data`. :meth:`XMLTreeBuilder.close`
and data is processed by method :meth:`data`. :meth:`XMLParser.close`
calls *target*\'s method :meth:`close`.
:class:`XMLTreeBuilder` can be used not only for building a tree structure.
:class:`XMLParser` can be used not only for building a tree structure.
This is an example of counting the maximum depth of an XML file::
>>> from xml.etree.ElementTree import XMLTreeBuilder
>>> from xml.etree.ElementTree import XMLParser
>>> class MaxDepth: # The target object of the parser
... maxDepth = 0
... depth = 0
......@@ -506,7 +615,7 @@ This is an example of counting the maximum depth of an XML file::
... return self.maxDepth
...
>>> target = MaxDepth()
>>> parser = XMLTreeBuilder(target=target)
>>> parser = XMLParser(target=target)
>>> exampleXml = """
... <a>
... <b>
......@@ -529,4 +638,3 @@ This is an example of counting the maximum depth of an XML file::
appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
and http://www.iana.org/assignments/character-sets.
:mod:`xml.etree` --- The ElementTree API for XML
================================================
.. module:: xml.etree
:synopsis: Package containing common ElementTree modules.
.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
The ElementTree package is a simple, efficient, and quite popular library for
XML manipulation in Python. The :mod:`xml.etree` package contains the most
common components from the ElementTree API library. In the current release,
this package contains the :mod:`ElementTree`, :mod:`ElementPath`, and
:mod:`ElementInclude` modules from the full ElementTree distribution.
.. XXX To be continued!
.. seealso::
`ElementTree Overview <http://effbot.org/tag/elementtree>`_
The home page for :mod:`ElementTree`. This includes links to additional
documentation, alternative implementations, and other add-ons.
......@@ -402,12 +402,14 @@ def temp_cwd(name='tempcwd', quiet=False):
rmtree(name)
def findfile(file, here=__file__):
def findfile(file, here=__file__, subdir=None):
"""Try to find a file on sys.path and the working directory. If it is not
found the argument passed to the function is returned (this does not
necessarily signal failure; could still be the legitimate path)."""
if os.path.isabs(file):
return file
if subdir is not None:
file = os.path.join(subdir, file)
path = sys.path
path = [os.path.dirname(here)] + path
for dn in path:
......
# test for xml.dom.minidom
import os
import sys
import pickle
from test.support import verbose, run_unittest
from test.support import verbose, run_unittest, findfile
import unittest
import xml.dom
......@@ -14,12 +12,8 @@ from xml.dom.minidom import parse, Node, Document, parseString
from xml.dom.minidom import getDOMImplementation
if __name__ == "__main__":
base = sys.argv[0]
else:
base = __file__
tstfile = os.path.join(os.path.dirname(base), "test.xml")
del base
tstfile = findfile("test.xml", subdir="xmltestdata")
# The tests of DocumentType importing use these helpers to construct
# the documents to work with, since not all DOM builders actually
......
......@@ -15,7 +15,9 @@ from xml.sax.xmlreader import InputSource, AttributesImpl, AttributesNSImpl
from io import StringIO
from test.support import findfile, run_unittest
import unittest
import os
TEST_XMLFILE = findfile("test.xml", subdir="xmltestdata")
TEST_XMLFILE_OUT = findfile("test.xml.out", subdir="xmltestdata")
ns_uri = "http://www.python.org/xml-ns/saxtest/"
......@@ -311,7 +313,7 @@ class XMLFilterBaseTest(unittest.TestCase):
#
# ===========================================================================
xml_test_out = open(findfile("test.xml.out")).read()
xml_test_out = open(TEST_XMLFILE_OUT).read()
class ExpatReaderTest(XmlTestBase):
......@@ -323,7 +325,7 @@ class ExpatReaderTest(XmlTestBase):
xmlgen = XMLGenerator(result)
parser.setContentHandler(xmlgen)
parser.parse(open(findfile("test.xml")))
parser.parse(open(TEST_XMLFILE))
self.assertEquals(result.getvalue(), xml_test_out)
......@@ -452,7 +454,7 @@ class ExpatReaderTest(XmlTestBase):
xmlgen = XMLGenerator(result)
parser.setContentHandler(xmlgen)
parser.parse(findfile("test.xml"))
parser.parse(TEST_XMLFILE)
self.assertEquals(result.getvalue(), xml_test_out)
......@@ -462,7 +464,7 @@ class ExpatReaderTest(XmlTestBase):
xmlgen = XMLGenerator(result)
parser.setContentHandler(xmlgen)
parser.parse(InputSource(findfile("test.xml")))
parser.parse(InputSource(TEST_XMLFILE))
self.assertEquals(result.getvalue(), xml_test_out)
......@@ -473,7 +475,7 @@ class ExpatReaderTest(XmlTestBase):
parser.setContentHandler(xmlgen)
inpsrc = InputSource()
inpsrc.setByteStream(open(findfile("test.xml")))
inpsrc.setByteStream(open(TEST_XMLFILE))
parser.parse(inpsrc)
self.assertEquals(result.getvalue(), xml_test_out)
......@@ -534,9 +536,9 @@ class ExpatReaderTest(XmlTestBase):
xmlgen = XMLGenerator(result)
parser = create_parser()
parser.setContentHandler(xmlgen)
parser.parse(findfile("test.xml"))
parser.parse(TEST_XMLFILE)
self.assertEquals(parser.getSystemId(), findfile("test.xml"))
self.assertEquals(parser.getSystemId(), TEST_XMLFILE)
self.assertEquals(parser.getPublicId(), None)
......
# xml.etree test. This file contains enough tests to make sure that
# all included components work as they should. For a more extensive
# test suite, see the selftest script in the ElementTree distribution.
# all included components work as they should.
# Large parts are extracted from the upstream test suite.
# IMPORTANT: the same doctests are run from "test_xml_etree_c" in
# order to ensure consistency between the C implementation and the
# Python implementation.
#
# For this purpose, the module-level "ET" symbol is temporarily
# monkey-patched when running the "test_xml_etree_c" test suite.
# Don't re-import "xml.etree.ElementTree" module in the docstring,
# except if the test is specific to the Python implementation.
import doctest
import sys
from test import support
from test.support import findfile
from xml.etree import ElementTree as ET
SIMPLE_XMLFILE = findfile("simple.xml", subdir="xmltestdata")
SIMPLE_NS_XMLFILE = findfile("simple-ns.xml", subdir="xmltestdata")
SAMPLE_XML = """
SAMPLE_XML = """\
<body>
<tag>text</tag>
<tag />
<tag class='a'>text</tag>
<tag class='b' />
<section>
<tag>subtext</tag>
<tag class='b' id='inner'>subtext</tag>
</section>
</body>
"""
SAMPLE_SECTION = """\
<section>
<tag class='b' id='inner'>subtext</tag>
<nexttag />
<nextsection>
<tag />
</nextsection>
</section>
"""
SAMPLE_XML_NS = """
<body xmlns="http://effbot.org/ns">
<tag>text</tag>
......@@ -27,6 +51,7 @@ SAMPLE_XML_NS = """
</body>
"""
def sanity():
"""
Import sanity.
......@@ -40,35 +65,110 @@ def check_method(method):
if not hasattr(method, '__call__'):
print(method, "not callable")
def serialize(ET, elem):
def serialize(elem, to_string=True, **options):
import io
tree = ET.ElementTree(elem)
if options.get("encoding"):
file = io.BytesIO()
else:
file = io.StringIO()
tree.write(file)
tree = ET.ElementTree(elem)
tree.write(file, **options)
if to_string:
return file.getvalue()
else:
file.seek(0)
return file
def summarize(elem):
if elem.tag == ET.Comment:
return "<Comment>"
return elem.tag
def summarize_list(seq):
return list(map(summarize, seq))
return [summarize(elem) for elem in seq]
def normalize_crlf(tree):
for elem in tree.iter():
if elem.text:
elem.text = elem.text.replace("\r\n", "\n")
if elem.tail:
elem.tail = elem.tail.replace("\r\n", "\n")
def normalize_exception(func, *args, **kwargs):
# Ignore the exception __module__
try:
func(*args, **kwargs)
except Exception as err:
print("Traceback (most recent call last):")
print("{}: {}".format(err.__class__.__name__, err))
def check_string(string):
len(string)
for char in string:
if len(char) != 1:
print("expected one-character string, got %r" % char)
new_string = string + ""
new_string = string + " "
string[:0]
def check_mapping(mapping):
len(mapping)
keys = mapping.keys()
items = mapping.items()
for key in keys:
item = mapping[key]
mapping["key"] = "value"
if mapping["key"] != "value":
print("expected value string, got %r" % mapping["key"])
def check_element(element):
if not ET.iselement(element):
print("not an element")
if not hasattr(element, "tag"):
print("no tag member")
if not hasattr(element, "attrib"):
print("no attrib member")
if not hasattr(element, "text"):
print("no text member")
if not hasattr(element, "tail"):
print("no tail member")
check_string(element.tag)
check_mapping(element.attrib)
if element.text is not None:
check_string(element.text)
if element.tail is not None:
check_string(element.tail)
for elem in element:
check_element(elem)
# --------------------------------------------------------------------
# element tree tests
def interface():
"""
Test element tree interface.
>>> from xml.etree import ElementTree as ET
>>> element = ET.Element("tag")
>>> check_element(element)
>>> tree = ET.ElementTree(element)
>>> check_element(tree.getroot())
>>> element = ET.Element("tag", key="value")
>>> element = ET.Element("t\\xe4g", key="value")
>>> tree = ET.ElementTree(element)
>>> repr(element) # doctest: +ELLIPSIS
"<Element 't\\xe4g' at 0x...>"
>>> element = ET.Element("tag", key="value")
Make sure all standard element methods exist.
>>> check_method(element.append)
>>> check_method(element.extend)
>>> check_method(element.insert)
>>> check_method(element.remove)
>>> check_method(element.getchildren)
>>> check_method(element.find)
>>> check_method(element.iterfind)
>>> check_method(element.findall)
>>> check_method(element.findtext)
>>> check_method(element.clear)
......@@ -76,38 +176,134 @@ def interface():
>>> check_method(element.set)
>>> check_method(element.keys)
>>> check_method(element.items)
>>> check_method(element.iter)
>>> check_method(element.itertext)
>>> check_method(element.getiterator)
These methods return an iterable. See bug 6472.
>>> check_method(element.iter("tag").__next__)
>>> check_method(element.iterfind("tag").__next__)
>>> check_method(element.iterfind("*").__next__)
>>> check_method(tree.iter("tag").__next__)
>>> check_method(tree.iterfind("tag").__next__)
>>> check_method(tree.iterfind("*").__next__)
These aliases are provided:
>>> assert ET.XML == ET.fromstring
>>> assert ET.PI == ET.ProcessingInstruction
>>> assert ET.XMLParser == ET.XMLTreeBuilder
"""
def simpleops():
"""
Basic method sanity checks.
>>> serialize(ET, element) # 1
>>> elem = ET.XML("<body><tag/></body>")
>>> serialize(elem)
'<body><tag /></body>'
>>> e = ET.Element("tag2")
>>> elem.append(e)
>>> serialize(elem)
'<body><tag /><tag2 /></body>'
>>> elem.remove(e)
>>> serialize(elem)
'<body><tag /></body>'
>>> elem.insert(0, e)
>>> serialize(elem)
'<body><tag2 /><tag /></body>'
>>> elem.remove(e)
>>> elem.extend([e])
>>> serialize(elem)
'<body><tag /><tag2 /></body>'
>>> elem.remove(e)
>>> element = ET.Element("tag", key="value")
>>> serialize(element) # 1
'<tag key="value" />'
>>> subelement = ET.Element("subtag")
>>> element.append(subelement)
>>> serialize(ET, element) # 2
>>> serialize(element) # 2
'<tag key="value"><subtag /></tag>'
>>> element.insert(0, subelement)
>>> serialize(ET, element) # 3
>>> serialize(element) # 3
'<tag key="value"><subtag /><subtag /></tag>'
>>> element.remove(subelement)
>>> serialize(ET, element) # 4
>>> serialize(element) # 4
'<tag key="value"><subtag /></tag>'
>>> element.remove(subelement)
>>> serialize(ET, element) # 5
>>> serialize(element) # 5
'<tag key="value" />'
>>> element.remove(subelement)
Traceback (most recent call last):
ValueError: list.remove(x): x not in list
>>> serialize(ET, element) # 6
>>> serialize(element) # 6
'<tag key="value" />'
>>> element[0:0] = [subelement, subelement, subelement]
>>> serialize(element[1])
'<subtag />'
>>> element[1:9] == [element[1], element[2]]
True
>>> element[:9:2] == [element[0], element[2]]
True
>>> del element[1:2]
>>> serialize(element)
'<tag key="value"><subtag /><subtag /></tag>'
"""
def cdata():
"""
Test CDATA handling (etc).
>>> serialize(ET.XML("<tag>hello</tag>"))
'<tag>hello</tag>'
>>> serialize(ET.XML("<tag>&#104;&#101;&#108;&#108;&#111;</tag>"))
'<tag>hello</tag>'
>>> serialize(ET.XML("<tag><![CDATA[hello]]></tag>"))
'<tag>hello</tag>'
"""
# Only with Python implementation
def simplefind():
"""
Test find methods using the elementpath fallback.
>>> from xml.etree import ElementTree
>>> CurrentElementPath = ElementTree.ElementPath
>>> ElementTree.ElementPath = ElementTree._SimpleElementPath()
>>> elem = ElementTree.XML(SAMPLE_XML)
>>> elem.find("tag").tag
'tag'
>>> ElementTree.ElementTree(elem).find("tag").tag
'tag'
>>> elem.findtext("tag")
'text'
>>> elem.findtext("tog")
>>> elem.findtext("tog", "default")
'default'
>>> ElementTree.ElementTree(elem).findtext("tag")
'text'
>>> summarize_list(elem.findall("tag"))
['tag', 'tag']
>>> summarize_list(elem.findall(".//tag"))
['tag', 'tag', 'tag']
Path syntax doesn't work in this case.
>>> elem.find("section/tag")
>>> elem.findtext("section/tag")
>>> summarize_list(elem.findall("section/tag"))
[]
>>> ElementTree.ElementPath = CurrentElementPath
"""
def find():
"""
Test find methods (including xpath syntax).
>>> from xml.etree import ElementTree as ET
>>> elem = ET.XML(SAMPLE_XML)
>>> elem.find("tag").tag
'tag'
......@@ -115,39 +311,67 @@ def find():
'tag'
>>> elem.find("section/tag").tag
'tag'
>>> elem.find("./tag").tag
'tag'
>>> ET.ElementTree(elem).find("./tag").tag
'tag'
>>> ET.ElementTree(elem).find("/tag").tag
'tag'
>>> elem[2] = ET.XML(SAMPLE_SECTION)
>>> elem.find("section/nexttag").tag
'nexttag'
>>> ET.ElementTree(elem).find("section/tag").tag
'tag'
>>> ET.ElementTree(elem).find("tog")
>>> ET.ElementTree(elem).find("tog/foo")
>>> elem.findtext("tag")
'text'
>>> elem.findtext("section/nexttag")
''
>>> elem.findtext("section/nexttag", "default")
''
>>> elem.findtext("tog")
>>> elem.findtext("tog", "default")
'default'
>>> ET.ElementTree(elem).findtext("tag")
'text'
>>> ET.ElementTree(elem).findtext("tog/foo")
>>> ET.ElementTree(elem).findtext("tog/foo", "default")
'default'
>>> ET.ElementTree(elem).findtext("./tag")
'text'
>>> ET.ElementTree(elem).findtext("/tag")
'text'
>>> elem.findtext("section/tag")
'subtext'
>>> ET.ElementTree(elem).findtext("section/tag")
'subtext'
>>> summarize_list(elem.findall("."))
['body']
>>> summarize_list(elem.findall("tag"))
['tag', 'tag']
>>> summarize_list(elem.findall("tog"))
[]
>>> summarize_list(elem.findall("tog/foo"))
[]
>>> summarize_list(elem.findall("*"))
['tag', 'tag', 'section']
>>> summarize_list(elem.findall(".//tag"))
['tag', 'tag', 'tag']
['tag', 'tag', 'tag', 'tag']
>>> summarize_list(elem.findall("section/tag"))
['tag']
>>> summarize_list(elem.findall("section//tag"))
['tag']
['tag', 'tag']
>>> summarize_list(elem.findall("section/*"))
['tag']
['tag', 'nexttag', 'nextsection']
>>> summarize_list(elem.findall("section//*"))
['tag']
['tag', 'nexttag', 'nextsection', 'tag']
>>> summarize_list(elem.findall("section/.//*"))
['tag']
['tag', 'nexttag', 'nextsection', 'tag']
>>> summarize_list(elem.findall("*/*"))
['tag']
['tag', 'nexttag', 'nextsection']
>>> summarize_list(elem.findall("*//*"))
['tag']
['tag', 'nexttag', 'nextsection', 'tag']
>>> summarize_list(elem.findall("*/tag"))
['tag']
>>> summarize_list(elem.findall("*/./tag"))
......@@ -155,13 +379,40 @@ def find():
>>> summarize_list(elem.findall("./tag"))
['tag', 'tag']
>>> summarize_list(elem.findall(".//tag"))
['tag', 'tag', 'tag']
['tag', 'tag', 'tag', 'tag']
>>> summarize_list(elem.findall("././tag"))
['tag', 'tag']
>>> summarize_list(ET.ElementTree(elem).findall("/tag"))
>>> summarize_list(elem.findall(".//tag[@class]"))
['tag', 'tag', 'tag']
>>> summarize_list(elem.findall(".//tag[@class='a']"))
['tag']
>>> summarize_list(elem.findall(".//tag[@class='b']"))
['tag', 'tag']
>>> summarize_list(elem.findall(".//tag[@id]"))
['tag']
>>> summarize_list(elem.findall(".//section[tag]"))
['section']
>>> summarize_list(elem.findall(".//section[element]"))
[]
>>> summarize_list(elem.findall("../tag"))
[]
>>> summarize_list(elem.findall("section/../tag"))
['tag', 'tag']
>>> summarize_list(ET.ElementTree(elem).findall("./tag"))
['tag', 'tag']
Following example is invalid in 1.2.
A leading '*' is assumed in 1.3.
>>> elem.findall("section//") == elem.findall("section//*")
True
ET's Path module handles this case incorrectly; this gives
a warning in 1.3, and the behaviour will be modified in 1.4.
>>> summarize_list(ET.ElementTree(elem).findall("/tag"))
['tag', 'tag']
>>> elem = ET.XML(SAMPLE_XML_NS)
>>> summarize_list(elem.findall("tag"))
[]
......@@ -171,21 +422,227 @@ def find():
['{http://effbot.org/ns}tag', '{http://effbot.org/ns}tag', '{http://effbot.org/ns}tag']
"""
def parseliteral():
r"""
def file_init():
"""
>>> import io
>>> from xml.etree import ElementTree as ET
>>> stringfile = io.BytesIO(SAMPLE_XML.encode("utf-8"))
>>> tree = ET.ElementTree(file=stringfile)
>>> tree.find("tag").tag
'tag'
>>> tree.find("section/tag").tag
'tag'
>>> tree = ET.ElementTree(file=SIMPLE_XMLFILE)
>>> tree.find("element").tag
'element'
>>> tree.find("element/../empty-element").tag
'empty-element'
"""
def bad_find():
"""
Check bad or unsupported path expressions.
>>> elem = ET.XML(SAMPLE_XML)
>>> elem.findall("/tag")
Traceback (most recent call last):
SyntaxError: cannot use absolute path on element
"""
def path_cache():
"""
Check that the path cache behaves sanely.
>>> elem = ET.XML(SAMPLE_XML)
>>> for i in range(10): ET.ElementTree(elem).find('./'+str(i))
>>> cache_len_10 = len(ET.ElementPath._cache)
>>> for i in range(10): ET.ElementTree(elem).find('./'+str(i))
>>> len(ET.ElementPath._cache) == cache_len_10
True
>>> for i in range(20): ET.ElementTree(elem).find('./'+str(i))
>>> len(ET.ElementPath._cache) > cache_len_10
True
>>> for i in range(600): ET.ElementTree(elem).find('./'+str(i))
>>> len(ET.ElementPath._cache) < 500
True
"""
def copy():
"""
Test copy handling (etc).
>>> import copy
>>> e1 = ET.XML("<tag>hello<foo/></tag>")
>>> e2 = copy.copy(e1)
>>> e3 = copy.deepcopy(e1)
>>> e1.find("foo").tag = "bar"
>>> serialize(e1)
'<tag>hello<bar /></tag>'
>>> serialize(e2)
'<tag>hello<bar /></tag>'
>>> serialize(e3)
'<tag>hello<foo /></tag>'
"""
def attrib():
"""
Test attribute handling.
>>> elem = ET.Element("tag")
>>> elem.get("key") # 1.1
>>> elem.get("key", "default") # 1.2
'default'
>>> elem.set("key", "value")
>>> elem.get("key") # 1.3
'value'
>>> elem = ET.Element("tag", key="value")
>>> elem.get("key") # 2.1
'value'
>>> elem.attrib # 2.2
{'key': 'value'}
>>> attrib = {"key": "value"}
>>> elem = ET.Element("tag", attrib)
>>> attrib.clear() # check for aliasing issues
>>> elem.get("key") # 3.1
'value'
>>> elem.attrib # 3.2
{'key': 'value'}
>>> attrib = {"key": "value"}
>>> elem = ET.Element("tag", **attrib)
>>> attrib.clear() # check for aliasing issues
>>> elem.get("key") # 4.1
'value'
>>> elem.attrib # 4.2
{'key': 'value'}
>>> elem = ET.Element("tag", {"key": "other"}, key="value")
>>> elem.get("key") # 5.1
'value'
>>> elem.attrib # 5.2
{'key': 'value'}
>>> elem = ET.Element('test')
>>> elem.text = "aa"
>>> elem.set('testa', 'testval')
>>> elem.set('testb', 'test2')
>>> ET.tostring(elem)
'<test testa="testval" testb="test2">aa</test>'
>>> sorted(elem.keys())
['testa', 'testb']
>>> sorted(elem.items())
[('testa', 'testval'), ('testb', 'test2')]
>>> elem.attrib['testb']
'test2'
>>> elem.attrib['testb'] = 'test1'
>>> elem.attrib['testc'] = 'test2'
>>> ET.tostring(elem)
'<test testa="testval" testb="test1" testc="test2">aa</test>'
"""
def makeelement():
"""
Test makeelement handling.
>>> elem = ET.Element("tag")
>>> attrib = {"key": "value"}
>>> subelem = elem.makeelement("subtag", attrib)
>>> if subelem.attrib is attrib:
... print("attrib aliasing")
>>> elem.append(subelem)
>>> serialize(elem)
'<tag><subtag key="value" /></tag>'
>>> elem.clear()
>>> serialize(elem)
'<tag />'
>>> elem.append(subelem)
>>> serialize(elem)
'<tag><subtag key="value" /></tag>'
>>> elem.extend([subelem, subelem])
>>> serialize(elem)
'<tag><subtag key="value" /><subtag key="value" /><subtag key="value" /></tag>'
>>> elem[:] = [subelem]
>>> serialize(elem)
'<tag><subtag key="value" /></tag>'
>>> elem[:] = tuple([subelem])
>>> serialize(elem)
'<tag><subtag key="value" /></tag>'
"""
def parsefile():
"""
Test parsing from file.
>>> tree = ET.parse(SIMPLE_XMLFILE)
>>> normalize_crlf(tree)
>>> tree.write(sys.stdout)
<root>
<element key="value">text</element>
<element>text</element>tail
<empty-element />
</root>
>>> tree = ET.parse(SIMPLE_NS_XMLFILE)
>>> normalize_crlf(tree)
>>> tree.write(sys.stdout)
<ns0:root xmlns:ns0="namespace">
<ns0:element key="value">text</ns0:element>
<ns0:element>text</ns0:element>tail
<ns0:empty-element />
</ns0:root>
>>> parser = ET.XMLParser()
>>> parser.version # XXX: Upgrade to 2.0.1?
'Expat 2.0.0'
>>> parser.feed(open(SIMPLE_XMLFILE).read())
>>> print(serialize(parser.close()))
<root>
<element key="value">text</element>
<element>text</element>tail
<empty-element />
</root>
>>> parser = ET.XMLTreeBuilder() # 1.2 compatibility
>>> parser.feed(open(SIMPLE_XMLFILE).read())
>>> print(serialize(parser.close()))
<root>
<element key="value">text</element>
<element>text</element>tail
<empty-element />
</root>
>>> target = ET.TreeBuilder()
>>> parser = ET.XMLParser(target=target)
>>> parser.feed(open(SIMPLE_XMLFILE).read())
>>> print(serialize(parser.close()))
<root>
<element key="value">text</element>
<element>text</element>tail
<empty-element />
</root>
"""
def parseliteral():
"""
>>> element = ET.XML("<html><body>text</body></html>")
>>> ET.ElementTree(element).write(sys.stdout)
<html><body>text</body></html>
>>> element = ET.fromstring("<html><body>text</body></html>")
>>> ET.ElementTree(element).write(sys.stdout)
<html><body>text</body></html>
>>> sequence = ["<html><body>", "text</bo", "dy></html>"]
>>> element = ET.fromstringlist(sequence)
>>> print(ET.tostring(element))
<html><body>text</body></html>
>>> print(repr(ET.tostring(element, "ascii")))
b"<?xml version='1.0' encoding='ascii'?>\n<html><body>text</body></html>"
>>> print("".join(ET.tostringlist(element)))
<html><body>text</body></html>
>>> ET.tostring(element, "ascii")
b"<?xml version='1.0' encoding='ascii'?>\\n<html><body>text</body></html>"
>>> _, ids = ET.XMLID("<html><body>text</body></html>")
>>> len(ids)
0
......@@ -196,25 +653,578 @@ def parseliteral():
'body'
"""
def iterparse():
"""
Test iterparse interface.
>>> iterparse = ET.iterparse
>>> context = iterparse(SIMPLE_XMLFILE)
>>> action, elem = next(context)
>>> print(action, elem.tag)
end element
>>> for action, elem in context:
... print(action, elem.tag)
end element
end empty-element
end root
>>> context.root.tag
'root'
>>> context = iterparse(SIMPLE_NS_XMLFILE)
>>> for action, elem in context:
... print(action, elem.tag)
end {namespace}element
end {namespace}element
end {namespace}empty-element
end {namespace}root
>>> events = ()
>>> context = iterparse(SIMPLE_XMLFILE, events)
>>> for action, elem in context:
... print(action, elem.tag)
>>> events = ()
>>> context = iterparse(SIMPLE_XMLFILE, events=events)
>>> for action, elem in context:
... print(action, elem.tag)
>>> events = ("start", "end")
>>> context = iterparse(SIMPLE_XMLFILE, events)
>>> for action, elem in context:
... print(action, elem.tag)
start root
start element
end element
start element
end element
start empty-element
end empty-element
end root
>>> events = ("start", "end", "start-ns", "end-ns")
>>> context = iterparse(SIMPLE_NS_XMLFILE, events)
>>> for action, elem in context:
... if action in ("start", "end"):
... print(action, elem.tag)
... else:
... print(action, elem)
start-ns ('', 'namespace')
start {namespace}root
start {namespace}element
end {namespace}element
start {namespace}element
end {namespace}element
start {namespace}empty-element
end {namespace}empty-element
end {namespace}root
end-ns None
>>> events = ("start", "end", "bogus")
>>> context = iterparse(SIMPLE_XMLFILE, events)
Traceback (most recent call last):
ValueError: unknown event 'bogus'
>>> import io
>>> source = io.BytesIO(
... b"<?xml version='1.0' encoding='iso-8859-1'?>\\n"
... b"<body xmlns='http://&#233;ffbot.org/ns'\\n"
... b" xmlns:cl\\xe9='http://effbot.org/ns'>text</body>\\n")
>>> events = ("start-ns",)
>>> context = iterparse(source, events)
>>> for action, elem in context:
... print(action, elem)
start-ns ('', 'http://\\xe9ffbot.org/ns')
start-ns ('cl\\xe9', 'http://effbot.org/ns')
>>> source = io.StringIO("<document />junk")
>>> try:
... for action, elem in iterparse(source):
... print(action, elem.tag)
... except ET.ParseError as v:
... print(v)
junk after document element: line 1, column 12
"""
def writefile():
"""
>>> elem = ET.Element("tag")
>>> elem.text = "text"
>>> serialize(elem)
'<tag>text</tag>'
>>> ET.SubElement(elem, "subtag").text = "subtext"
>>> serialize(elem)
'<tag>text<subtag>subtext</subtag></tag>'
def check_encoding(ET, encoding):
Test tag suppression
>>> elem.tag = None
>>> serialize(elem)
'text<subtag>subtext</subtag>'
>>> elem.insert(0, ET.Comment("comment"))
>>> serialize(elem) # assumes 1.3
'text<!--comment--><subtag>subtext</subtag>'
>>> elem[0] = ET.PI("key", "value")
>>> serialize(elem)
'text<?key value?><subtag>subtext</subtag>'
"""
def custom_builder():
"""
Test parser w. custom builder.
>>> class Builder:
... def start(self, tag, attrib):
... print("start", tag)
... def end(self, tag):
... print("end", tag)
... def data(self, text):
... pass
>>> builder = Builder()
>>> parser = ET.XMLParser(target=builder)
>>> parser.feed(open(SIMPLE_XMLFILE, "r").read())
start root
start element
end element
start element
end element
start empty-element
end empty-element
end root
>>> class Builder:
... def start(self, tag, attrib):
... print("start", tag)
... def end(self, tag):
... print("end", tag)
... def data(self, text):
... pass
... def pi(self, target, data):
... print("pi", target, repr(data))
... def comment(self, data):
... print("comment", repr(data))
>>> builder = Builder()
>>> parser = ET.XMLParser(target=builder)
>>> parser.feed(open(SIMPLE_NS_XMLFILE, "r").read())
pi pi 'data'
comment ' comment '
start {namespace}root
start {namespace}element
end {namespace}element
start {namespace}element
end {namespace}element
start {namespace}empty-element
end {namespace}empty-element
end {namespace}root
"""
def getchildren():
"""
Test Element.getchildren()
>>> tree = ET.parse(open(SIMPLE_XMLFILE, "rb"))
>>> for elem in tree.getroot().iter():
... summarize_list(elem.getchildren())
['element', 'element', 'empty-element']
[]
[]
[]
>>> for elem in tree.getiterator():
... summarize_list(elem.getchildren())
['element', 'element', 'empty-element']
[]
[]
[]
>>> elem = ET.XML(SAMPLE_XML)
>>> len(elem.getchildren())
3
>>> len(elem[2].getchildren())
1
>>> elem[:] == elem.getchildren()
True
>>> child1 = elem[0]
>>> child2 = elem[2]
>>> del elem[1:2]
>>> len(elem.getchildren())
2
>>> child1 == elem[0]
True
>>> child2 == elem[1]
True
>>> elem[0:2] = [child2, child1]
>>> child2 == elem[0]
True
>>> child1 == elem[1]
True
>>> child1 == elem[0]
False
>>> elem.clear()
>>> elem.getchildren()
[]
"""
def writestring():
"""
>>> elem = ET.XML("<html><body>text</body></html>")
>>> ET.tostring(elem)
'<html><body>text</body></html>'
>>> elem = ET.fromstring("<html><body>text</body></html>")
>>> ET.tostring(elem)
'<html><body>text</body></html>'
"""
>>> from xml.etree import ElementTree as ET
>>> check_encoding(ET, "ascii")
>>> check_encoding(ET, "us-ascii")
>>> check_encoding(ET, "iso-8859-1")
>>> check_encoding(ET, "iso-8859-15")
>>> check_encoding(ET, "cp437")
>>> check_encoding(ET, "mac-roman")
def check_encoding(encoding):
"""
>>> check_encoding("ascii")
>>> check_encoding("us-ascii")
>>> check_encoding("iso-8859-1")
>>> check_encoding("iso-8859-15")
>>> check_encoding("cp437")
>>> check_encoding("mac-roman")
"""
ET.XML("<?xml version='1.0' encoding='%s'?><xml />" % encoding)
def processinginstruction():
def encoding():
r"""
Test ProcessingInstruction directly
Test encoding issues.
>>> from xml.etree import ElementTree as ET
>>> elem = ET.Element("tag")
>>> elem.text = "abc"
>>> serialize(elem)
'<tag>abc</tag>'
>>> serialize(elem, encoding="utf-8")
b'<tag>abc</tag>'
>>> serialize(elem, encoding="us-ascii")
b'<tag>abc</tag>'
>>> serialize(elem, encoding="iso-8859-1")
b"<?xml version='1.0' encoding='iso-8859-1'?>\n<tag>abc</tag>"
>>> elem.text = "<&\"\'>"
>>> serialize(elem)
'<tag>&lt;&amp;"\'&gt;</tag>'
>>> serialize(elem, encoding="utf-8")
b'<tag>&lt;&amp;"\'&gt;</tag>'
>>> serialize(elem, encoding="us-ascii") # cdata characters
b'<tag>&lt;&amp;"\'&gt;</tag>'
>>> serialize(elem, encoding="iso-8859-1")
b'<?xml version=\'1.0\' encoding=\'iso-8859-1\'?>\n<tag>&lt;&amp;"\'&gt;</tag>'
>>> elem.attrib["key"] = "<&\"\'>"
>>> elem.text = None
>>> serialize(elem)
'<tag key="&lt;&amp;&quot;\'&gt;" />'
>>> serialize(elem, encoding="utf-8")
b'<tag key="&lt;&amp;&quot;\'&gt;" />'
>>> serialize(elem, encoding="us-ascii")
b'<tag key="&lt;&amp;&quot;\'&gt;" />'
>>> serialize(elem, encoding="iso-8859-1")
b'<?xml version=\'1.0\' encoding=\'iso-8859-1\'?>\n<tag key="&lt;&amp;&quot;\'&gt;" />'
>>> elem.text = '\xe5\xf6\xf6<>'
>>> elem.attrib.clear()
>>> serialize(elem)
'<tag>\xe5\xf6\xf6&lt;&gt;</tag>'
>>> serialize(elem, encoding="utf-8")
b'<tag>\xc3\xa5\xc3\xb6\xc3\xb6&lt;&gt;</tag>'
>>> serialize(elem, encoding="us-ascii")
b'<tag>&#229;&#246;&#246;&lt;&gt;</tag>'
>>> serialize(elem, encoding="iso-8859-1")
b"<?xml version='1.0' encoding='iso-8859-1'?>\n<tag>\xe5\xf6\xf6&lt;&gt;</tag>"
>>> elem.attrib["key"] = '\xe5\xf6\xf6<>'
>>> elem.text = None
>>> serialize(elem)
'<tag key="\xe5\xf6\xf6&lt;&gt;" />'
>>> serialize(elem, encoding="utf-8")
b'<tag key="\xc3\xa5\xc3\xb6\xc3\xb6&lt;&gt;" />'
>>> serialize(elem, encoding="us-ascii")
b'<tag key="&#229;&#246;&#246;&lt;&gt;" />'
>>> serialize(elem, encoding="iso-8859-1")
b'<?xml version=\'1.0\' encoding=\'iso-8859-1\'?>\n<tag key="\xe5\xf6\xf6&lt;&gt;" />'
"""
def methods():
r"""
Test serialization methods.
>>> e = ET.XML("<html><link/><script>1 &lt; 2</script></html>")
>>> e.tail = "\n"
>>> serialize(e)
'<html><link /><script>1 &lt; 2</script></html>\n'
>>> serialize(e, method=None)
'<html><link /><script>1 &lt; 2</script></html>\n'
>>> serialize(e, method="xml")
'<html><link /><script>1 &lt; 2</script></html>\n'
>>> serialize(e, method="html")
'<html><link><script>1 < 2</script></html>\n'
>>> serialize(e, method="text")
'1 < 2\n'
"""
def iterators():
"""
Test iterators.
>>> e = ET.XML("<html><body>this is a <i>paragraph</i>.</body>..</html>")
>>> summarize_list(e.iter())
['html', 'body', 'i']
>>> summarize_list(e.find("body").iter())
['body', 'i']
>>> summarize(next(e.iter()))
'html'
>>> "".join(e.itertext())
'this is a paragraph...'
>>> "".join(e.find("body").itertext())
'this is a paragraph.'
>>> next(e.itertext())
'this is a '
Method iterparse should return an iterator. See bug 6472.
>>> sourcefile = serialize(e, to_string=False)
>>> next(ET.iterparse(sourcefile)) # doctest: +ELLIPSIS
('end', <Element 'i' at 0x...>)
>>> tree = ET.ElementTree(None)
>>> tree.iter()
Traceback (most recent call last):
AttributeError: 'NoneType' object has no attribute 'iter'
"""
ENTITY_XML = """\
<!DOCTYPE points [
<!ENTITY % user-entities SYSTEM 'user-entities.xml'>
%user-entities;
]>
<document>&entity;</document>
"""
def entity():
"""
Test entity handling.
1) good entities
>>> e = ET.XML("<document title='&#x8230;'>test</document>")
>>> serialize(e, encoding="us-ascii")
b'<document title="&#33328;">test</document>'
>>> serialize(e)
'<document title="\u8230">test</document>'
2) bad entities
>>> normalize_exception(ET.XML, "<document>&entity;</document>")
Traceback (most recent call last):
ParseError: undefined entity: line 1, column 10
>>> normalize_exception(ET.XML, ENTITY_XML)
Traceback (most recent call last):
ParseError: undefined entity &entity;: line 5, column 10
3) custom entity
>>> parser = ET.XMLParser()
>>> parser.entity["entity"] = "text"
>>> parser.feed(ENTITY_XML)
>>> root = parser.close()
>>> serialize(root)
'<document>text</document>'
"""
def error(xml):
"""
Test error handling.
>>> issubclass(ET.ParseError, SyntaxError)
True
>>> error("foo").position
(1, 0)
>>> error("<tag>&foo;</tag>").position
(1, 5)
>>> error("foobar<").position
(1, 6)
"""
try:
ET.XML(xml)
except ET.ParseError:
return sys.exc_info()[1]
def namespace():
"""
Test namespace issues.
1) xml namespace
>>> elem = ET.XML("<tag xml:lang='en' />")
>>> serialize(elem) # 1.1
'<tag xml:lang="en" />'
2) other "well-known" namespaces
>>> elem = ET.XML("<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' />")
>>> serialize(elem) # 2.1
'<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" />'
>>> elem = ET.XML("<html:html xmlns:html='http://www.w3.org/1999/xhtml' />")
>>> serialize(elem) # 2.2
'<html:html xmlns:html="http://www.w3.org/1999/xhtml" />'
>>> elem = ET.XML("<soap:Envelope xmlns:soap='http://schemas.xmlsoap.org/soap/envelope' />")
>>> serialize(elem) # 2.3
'<ns0:Envelope xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope" />'
3) unknown namespaces
>>> elem = ET.XML(SAMPLE_XML_NS)
>>> print(serialize(elem))
<ns0:body xmlns:ns0="http://effbot.org/ns">
<ns0:tag>text</ns0:tag>
<ns0:tag />
<ns0:section>
<ns0:tag>subtext</ns0:tag>
</ns0:section>
</ns0:body>
"""
def qname():
"""
Test QName handling.
1) decorated tags
>>> elem = ET.Element("{uri}tag")
>>> serialize(elem) # 1.1
'<ns0:tag xmlns:ns0="uri" />'
>>> elem = ET.Element(ET.QName("{uri}tag"))
>>> serialize(elem) # 1.2
'<ns0:tag xmlns:ns0="uri" />'
>>> elem = ET.Element(ET.QName("uri", "tag"))
>>> serialize(elem) # 1.3
'<ns0:tag xmlns:ns0="uri" />'
2) decorated attributes
>>> elem.clear()
>>> elem.attrib["{uri}key"] = "value"
>>> serialize(elem) # 2.1
'<ns0:tag xmlns:ns0="uri" ns0:key="value" />'
>>> elem.clear()
>>> elem.attrib[ET.QName("{uri}key")] = "value"
>>> serialize(elem) # 2.2
'<ns0:tag xmlns:ns0="uri" ns0:key="value" />'
3) decorated values are not converted by default, but the
QName wrapper can be used for values
>>> elem.clear()
>>> elem.attrib["{uri}key"] = "{uri}value"
>>> serialize(elem) # 3.1
'<ns0:tag xmlns:ns0="uri" ns0:key="{uri}value" />'
>>> elem.clear()
>>> elem.attrib["{uri}key"] = ET.QName("{uri}value")
>>> serialize(elem) # 3.2
'<ns0:tag xmlns:ns0="uri" ns0:key="ns0:value" />'
>>> elem.clear()
>>> subelem = ET.Element("tag")
>>> subelem.attrib["{uri1}key"] = ET.QName("{uri2}value")
>>> elem.append(subelem)
>>> elem.append(subelem)
>>> serialize(elem) # 3.3
'<ns0:tag xmlns:ns0="uri" xmlns:ns1="uri1" xmlns:ns2="uri2"><tag ns1:key="ns2:value" /><tag ns1:key="ns2:value" /></ns0:tag>'
4) Direct QName tests
>>> str(ET.QName('ns', 'tag'))
'{ns}tag'
>>> str(ET.QName('{ns}tag'))
'{ns}tag'
>>> q1 = ET.QName('ns', 'tag')
>>> q2 = ET.QName('ns', 'tag')
>>> q1 == q2
True
>>> q2 = ET.QName('ns', 'other-tag')
>>> q1 == q2
False
>>> q1 == 'ns:tag'
False
>>> q1 == '{ns}tag'
True
"""
def doctype_public():
"""
Test PUBLIC doctype.
>>> elem = ET.XML('<!DOCTYPE html PUBLIC'
... ' "-//W3C//DTD XHTML 1.0 Transitional//EN"'
... ' "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">'
... '<html>text</html>')
"""
def xpath_tokenizer(p):
"""
Test the XPath tokenizer.
>>> # tests from the xml specification
>>> xpath_tokenizer("*")
['*']
>>> xpath_tokenizer("text()")
['text', '()']
>>> xpath_tokenizer("@name")
['@', 'name']
>>> xpath_tokenizer("@*")
['@', '*']
>>> xpath_tokenizer("para[1]")
['para', '[', '1', ']']
>>> xpath_tokenizer("para[last()]")
['para', '[', 'last', '()', ']']
>>> xpath_tokenizer("*/para")
['*', '/', 'para']
>>> xpath_tokenizer("/doc/chapter[5]/section[2]")
['/', 'doc', '/', 'chapter', '[', '5', ']', '/', 'section', '[', '2', ']']
>>> xpath_tokenizer("chapter//para")
['chapter', '//', 'para']
>>> xpath_tokenizer("//para")
['//', 'para']
>>> xpath_tokenizer("//olist/item")
['//', 'olist', '/', 'item']
>>> xpath_tokenizer(".")
['.']
>>> xpath_tokenizer(".//para")
['.', '//', 'para']
>>> xpath_tokenizer("..")
['..']
>>> xpath_tokenizer("../@lang")
['..', '/', '@', 'lang']
>>> xpath_tokenizer("chapter[title]")
['chapter', '[', 'title', ']']
>>> xpath_tokenizer("employee[@secretary and @assistant]")
['employee', '[', '@', 'secretary', '', 'and', '', '@', 'assistant', ']']
>>> # additional tests
>>> xpath_tokenizer("{http://spam}egg")
['{http://spam}egg']
>>> xpath_tokenizer("./spam.egg")
['.', '/', 'spam.egg']
>>> xpath_tokenizer(".//{http://spam}egg")
['.', '//', '{http://spam}egg']
"""
from xml.etree import ElementPath
out = []
for op, tag in ElementPath.xpath_tokenizer(p):
out.append(op or tag)
return out
def processinginstruction():
"""
Test ProcessingInstruction directly
>>> ET.tostring(ET.ProcessingInstruction('test', 'instruction'))
'<?test instruction?>'
......@@ -226,20 +1236,7 @@ def processinginstruction():
>>> ET.tostring(ET.PI('test', '<testing&>'))
'<?test <testing&>?>'
>>> ET.tostring(ET.PI('test', '<testing&>\xe3'), 'latin1')
b"<?xml version='1.0' encoding='latin1'?>\n<?test <testing&>\xe3?>"
"""
def check_issue6233():
"""
>>> from xml.etree import ElementTree as ET
>>> e = ET.XML("<?xml version='1.0' encoding='utf-8'?><body>t\xe3g</body>")
>>> ET.tostring(e, 'ascii')
b"<?xml version='1.0' encoding='ascii'?>\\n<body>t&#227;g</body>"
>>> e = ET.XML("<?xml version='1.0' encoding='iso-8859-1'?><body>t\xe3g</body>".encode('iso-8859-1')) # create byte string with the right encoding
>>> ET.tostring(e, 'ascii')
b"<?xml version='1.0' encoding='ascii'?>\\n<body>t&#227;g</body>"
b"<?xml version='1.0' encoding='latin1'?>\\n<?test <testing&>\\xe3?>"
"""
#
......@@ -306,9 +1303,9 @@ XINCLUDE["default.xml"] = """\
<?xml version='1.0'?>
<document xmlns:xi="http://www.w3.org/2001/XInclude">
<p>Example.</p>
<xi:include href="samples/simple.xml"/>
<xi:include href="{}"/>
</document>
"""
""".format(SIMPLE_XMLFILE)
def xinclude_loader(href, parse="xml", encoding=None):
try:
......@@ -329,7 +1326,7 @@ def xinclude():
>>> document = xinclude_loader("C1.xml")
>>> ElementInclude.include(document, xinclude_loader)
>>> print(serialize(ET, document)) # C1
>>> print(serialize(document)) # C1
<document>
<p>120 Mz is adequate for an average home user.</p>
<disclaimer>
......@@ -343,7 +1340,7 @@ def xinclude():
>>> document = xinclude_loader("C2.xml")
>>> ElementInclude.include(document, xinclude_loader)
>>> print(serialize(ET, document)) # C2
>>> print(serialize(document)) # C2
<document>
<p>This document has been accessed
324387 times.</p>
......@@ -353,7 +1350,7 @@ def xinclude():
>>> document = xinclude_loader("C3.xml")
>>> ElementInclude.include(document, xinclude_loader)
>>> print(serialize(ET, document)) # C3
>>> print(serialize(document)) # C3
<document>
<p>The following is the source of the "data.xml" resource:</p>
<example>&lt;?xml version='1.0'?&gt;
......@@ -370,13 +1367,489 @@ def xinclude():
>>> ElementInclude.include(document, xinclude_loader)
Traceback (most recent call last):
IOError: resource not found
>>> # print serialize(ET, document) # C5
>>> # print(serialize(document)) # C5
"""
def xinclude_default():
"""
>>> from xml.etree import ElementInclude
>>> document = xinclude_loader("default.xml")
>>> ElementInclude.include(document)
>>> print(serialize(document)) # default
<document>
<p>Example.</p>
<root>
<element key="value">text</element>
<element>text</element>tail
<empty-element />
</root>
</document>
"""
#
# badly formatted xi:include tags
XINCLUDE_BAD = {}
XINCLUDE_BAD["B1.xml"] = """\
<?xml version='1.0'?>
<document xmlns:xi="http://www.w3.org/2001/XInclude">
<p>120 Mz is adequate for an average home user.</p>
<xi:include href="disclaimer.xml" parse="BAD_TYPE"/>
</document>
"""
XINCLUDE_BAD["B2.xml"] = """\
<?xml version='1.0'?>
<div xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:fallback></xi:fallback>
</div>
"""
def xinclude_failures():
r"""
Test failure to locate included XML file.
>>> from xml.etree import ElementInclude
>>> def none_loader(href, parser, encoding=None):
... return None
>>> document = ET.XML(XINCLUDE["C1.xml"])
>>> ElementInclude.include(document, loader=none_loader)
Traceback (most recent call last):
xml.etree.ElementInclude.FatalIncludeError: cannot load 'disclaimer.xml' as 'xml'
Test failure to locate included text file.
>>> document = ET.XML(XINCLUDE["C2.xml"])
>>> ElementInclude.include(document, loader=none_loader)
Traceback (most recent call last):
xml.etree.ElementInclude.FatalIncludeError: cannot load 'count.txt' as 'text'
Test bad parse type.
>>> document = ET.XML(XINCLUDE_BAD["B1.xml"])
>>> ElementInclude.include(document, loader=none_loader)
Traceback (most recent call last):
xml.etree.ElementInclude.FatalIncludeError: unknown parse type in xi:include tag ('BAD_TYPE')
Test xi:fallback outside xi:include.
>>> document = ET.XML(XINCLUDE_BAD["B2.xml"])
>>> ElementInclude.include(document, loader=none_loader)
Traceback (most recent call last):
xml.etree.ElementInclude.FatalIncludeError: xi:fallback tag must be child of xi:include ('{http://www.w3.org/2001/XInclude}fallback')
"""
# --------------------------------------------------------------------
# reported bugs
def bug_xmltoolkit21():
"""
marshaller gives obscure errors for non-string values
>>> elem = ET.Element(123)
>>> serialize(elem) # tag
Traceback (most recent call last):
TypeError: cannot serialize 123 (type int)
>>> elem = ET.Element("elem")
>>> elem.text = 123
>>> serialize(elem) # text
Traceback (most recent call last):
TypeError: cannot serialize 123 (type int)
>>> elem = ET.Element("elem")
>>> elem.tail = 123
>>> serialize(elem) # tail
Traceback (most recent call last):
TypeError: cannot serialize 123 (type int)
>>> elem = ET.Element("elem")
>>> elem.set(123, "123")
>>> serialize(elem) # attribute key
Traceback (most recent call last):
TypeError: cannot serialize 123 (type int)
>>> elem = ET.Element("elem")
>>> elem.set("123", 123)
>>> serialize(elem) # attribute value
Traceback (most recent call last):
TypeError: cannot serialize 123 (type int)
"""
def bug_xmltoolkit25():
"""
typo in ElementTree.findtext
>>> elem = ET.XML(SAMPLE_XML)
>>> tree = ET.ElementTree(elem)
>>> tree.findtext("tag")
'text'
>>> tree.findtext("section/tag")
'subtext'
"""
def bug_xmltoolkit28():
"""
.//tag causes exceptions
>>> tree = ET.XML("<doc><table><tbody/></table></doc>")
>>> summarize_list(tree.findall(".//thead"))
[]
>>> summarize_list(tree.findall(".//tbody"))
['tbody']
"""
def test_main():
def bug_xmltoolkitX1():
"""
dump() doesn't flush the output buffer
>>> tree = ET.XML("<doc><table><tbody/></table></doc>")
>>> ET.dump(tree); print("tail")
<doc><table><tbody /></table></doc>
tail
"""
def bug_xmltoolkit39():
"""
non-ascii element and attribute names doesn't work
>>> tree = ET.XML(b"<?xml version='1.0' encoding='iso-8859-1'?><t\\xe4g />")
>>> ET.tostring(tree, "utf-8")
b'<t\\xc3\\xa4g />'
>>> tree = ET.XML(b"<?xml version='1.0' encoding='iso-8859-1'?><tag \\xe4ttr='v&#228;lue' />")
>>> tree.attrib
{'\\xe4ttr': 'v\\xe4lue'}
>>> ET.tostring(tree, "utf-8")
b'<tag \\xc3\\xa4ttr="v\\xc3\\xa4lue" />'
>>> tree = ET.XML(b"<?xml version='1.0' encoding='iso-8859-1'?><t\\xe4g>text</t\\xe4g>")
>>> ET.tostring(tree, "utf-8")
b'<t\\xc3\\xa4g>text</t\\xc3\\xa4g>'
>>> tree = ET.Element("t\u00e4g")
>>> ET.tostring(tree, "utf-8")
b'<t\\xc3\\xa4g />'
>>> tree = ET.Element("tag")
>>> tree.set("\u00e4ttr", "v\u00e4lue")
>>> ET.tostring(tree, "utf-8")
b'<tag \\xc3\\xa4ttr="v\\xc3\\xa4lue" />'
"""
def bug_xmltoolkit54():
"""
problems handling internally defined entities
>>> e = ET.XML("<!DOCTYPE doc [<!ENTITY ldots '&#x8230;'>]><doc>&ldots;</doc>")
>>> serialize(e, encoding="us-ascii")
b'<doc>&#33328;</doc>'
>>> serialize(e)
'<doc>\u8230</doc>'
"""
def bug_xmltoolkit55():
"""
make sure we're reporting the first error, not the last
>>> normalize_exception(ET.XML, b"<!DOCTYPE doc SYSTEM 'doc.dtd'><doc>&ldots;&ndots;&rdots;</doc>")
Traceback (most recent call last):
ParseError: undefined entity &ldots;: line 1, column 36
"""
class ExceptionFile:
def read(self, x):
raise IOError
def xmltoolkit60():
"""
Handle crash in stream source.
>>> tree = ET.parse(ExceptionFile())
Traceback (most recent call last):
IOError
"""
XMLTOOLKIT62_DOC = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE patent-application-publication SYSTEM "pap-v15-2001-01-31.dtd" []>
<patent-application-publication>
<subdoc-abstract>
<paragraph id="A-0001" lvl="0">A new cultivar of Begonia plant named &lsquo;BCT9801BEG&rsquo;.</paragraph>
</subdoc-abstract>
</patent-application-publication>"""
def xmltoolkit62():
"""
Don't crash when using custom entities.
>>> xmltoolkit62()
'A new cultivar of Begonia plant named \u2018BCT9801BEG\u2019.'
"""
ENTITIES = {'rsquo': '\u2019', 'lsquo': '\u2018'}
parser = ET.XMLTreeBuilder()
parser.entity.update(ENTITIES)
parser.feed(XMLTOOLKIT62_DOC)
t = parser.close()
return t.find('.//paragraph').text
def xmltoolkit63():
"""
Check reference leak.
>>> xmltoolkit63()
>>> count = sys.getrefcount(None)
>>> for i in range(1000):
... xmltoolkit63()
>>> sys.getrefcount(None) - count
0
"""
tree = ET.TreeBuilder()
tree.start("tag", {})
tree.data("text")
tree.end("tag")
# --------------------------------------------------------------------
def bug_200708_newline():
r"""
Preserve newlines in attributes.
>>> e = ET.Element('SomeTag', text="def _f():\n return 3\n")
>>> ET.tostring(e)
'<SomeTag text="def _f():&#10; return 3&#10;" />'
>>> ET.XML(ET.tostring(e)).get("text")
'def _f():\n return 3\n'
>>> ET.tostring(ET.XML(ET.tostring(e)))
'<SomeTag text="def _f():&#10; return 3&#10;" />'
"""
def bug_200708_close():
"""
Test default builder.
>>> parser = ET.XMLParser() # default
>>> parser.feed("<element>some text</element>")
>>> summarize(parser.close())
'element'
Test custom builder.
>>> class EchoTarget:
... def close(self):
... return ET.Element("element") # simulate root
>>> parser = ET.XMLParser(EchoTarget())
>>> parser.feed("<element>some text</element>")
>>> summarize(parser.close())
'element'
"""
def bug_200709_default_namespace():
"""
>>> e = ET.Element("{default}elem")
>>> s = ET.SubElement(e, "{default}elem")
>>> serialize(e, default_namespace="default") # 1
'<elem xmlns="default"><elem /></elem>'
>>> e = ET.Element("{default}elem")
>>> s = ET.SubElement(e, "{default}elem")
>>> s = ET.SubElement(e, "{not-default}elem")
>>> serialize(e, default_namespace="default") # 2
'<elem xmlns="default" xmlns:ns1="not-default"><elem /><ns1:elem /></elem>'
>>> e = ET.Element("{default}elem")
>>> s = ET.SubElement(e, "{default}elem")
>>> s = ET.SubElement(e, "elem") # unprefixed name
>>> serialize(e, default_namespace="default") # 3
Traceback (most recent call last):
ValueError: cannot use non-qualified names with default_namespace option
"""
def bug_200709_register_namespace():
"""
>>> ET.tostring(ET.Element("{http://namespace.invalid/does/not/exist/}title"))
'<ns0:title xmlns:ns0="http://namespace.invalid/does/not/exist/" />'
>>> ET.register_namespace("foo", "http://namespace.invalid/does/not/exist/")
>>> ET.tostring(ET.Element("{http://namespace.invalid/does/not/exist/}title"))
'<foo:title xmlns:foo="http://namespace.invalid/does/not/exist/" />'
And the Dublin Core namespace is in the default list:
>>> ET.tostring(ET.Element("{http://purl.org/dc/elements/1.1/}title"))
'<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/" />'
"""
def bug_200709_element_comment():
"""
Not sure if this can be fixed, really (since the serializer needs
ET.Comment, not cET.comment).
>>> a = ET.Element('a')
>>> a.append(ET.Comment('foo'))
>>> a[0].tag == ET.Comment
True
>>> a = ET.Element('a')
>>> a.append(ET.PI('foo'))
>>> a[0].tag == ET.PI
True
"""
def bug_200709_element_insert():
"""
>>> a = ET.Element('a')
>>> b = ET.SubElement(a, 'b')
>>> c = ET.SubElement(a, 'c')
>>> d = ET.Element('d')
>>> a.insert(0, d)
>>> summarize_list(a)
['d', 'b', 'c']
>>> a.insert(-1, d)
>>> summarize_list(a)
['d', 'b', 'd', 'c']
"""
def bug_200709_iter_comment():
"""
>>> a = ET.Element('a')
>>> b = ET.SubElement(a, 'b')
>>> comment_b = ET.Comment("TEST-b")
>>> b.append(comment_b)
>>> summarize_list(a.iter(ET.Comment))
['<Comment>']
"""
# --------------------------------------------------------------------
# reported on bugs.python.org
def bug_1534630():
"""
>>> bob = ET.TreeBuilder()
>>> e = bob.data("data")
>>> e = bob.start("tag", {})
>>> e = bob.end("tag")
>>> e = bob.close()
>>> serialize(e)
'<tag />'
"""
def check_issue6233():
"""
>>> e = ET.XML(b"<?xml version='1.0' encoding='utf-8'?><body>t\\xc3\\xa3g</body>")
>>> ET.tostring(e, 'ascii')
b"<?xml version='1.0' encoding='ascii'?>\\n<body>t&#227;g</body>"
>>> e = ET.XML(b"<?xml version='1.0' encoding='iso-8859-1'?><body>t\\xe3g</body>")
>>> ET.tostring(e, 'ascii')
b"<?xml version='1.0' encoding='ascii'?>\\n<body>t&#227;g</body>"
"""
def check_issue3151():
"""
>>> e = ET.XML('<prefix:localname xmlns:prefix="${stuff}"/>')
>>> e.tag
'{${stuff}}localname'
>>> t = ET.ElementTree(e)
>>> ET.tostring(e)
'<ns0:localname xmlns:ns0="${stuff}" />'
"""
def check_issue6565():
"""
>>> elem = ET.XML("<body><tag/></body>")
>>> summarize_list(elem)
['tag']
>>> newelem = ET.XML(SAMPLE_XML)
>>> elem[:] = newelem[:]
>>> summarize_list(elem)
['tag', 'tag', 'section']
"""
# --------------------------------------------------------------------
class CleanContext(object):
"""Provide default namespace mapping and path cache."""
def __enter__(self):
from xml.etree import ElementTree
self._nsmap = ElementTree._namespace_map
self._path_cache = ElementTree.ElementPath._cache
# Copy the default namespace mapping
ElementTree._namespace_map = self._nsmap.copy()
# Copy the path cache (should be empty)
ElementTree.ElementPath._cache = self._path_cache.copy()
def __exit__(self, *args):
from xml.etree import ElementTree
# Restore mapping and path cache
ElementTree._namespace_map = self._nsmap
ElementTree.ElementPath._cache = self._path_cache
def test_main(module_name='xml.etree.ElementTree'):
import warnings
from test import test_xml_etree
def ignore(message, category=DeprecationWarning):
warnings.filterwarnings("ignore", message, category)
# The same doctests are used for both the Python and the C implementations
assert test_xml_etree.ET.__name__ == module_name
with warnings.catch_warnings(), CleanContext():
# Search behaviour is broken if search path starts with "/".
ignore("This search is broken in 1.3 and earlier, and will be fixed "
"in a future version. If you rely on the current behaviour, "
"change it to '.+'", FutureWarning)
# Element.getchildren() and Element.getiterator() are deprecated.
ignore("This method will be removed in future versions. "
"Use .+ instead.")
# XMLParser.doctype() is deprecated.
ignore("This method of XMLParser is deprecated. "
"Define doctype.. method on the TreeBuilder target.")
support.run_doctest(test_xml_etree, verbosity=True)
# The module should not be changed by the tests
assert test_xml_etree.ET.__name__ == module_name
if __name__ == '__main__':
test_main()
# xml.etree test for cElementTree
import doctest
import sys
from test import support
ET = support.import_module('xml.etree.cElementTree')
cET = support.import_module('xml.etree.cElementTree')
SAMPLE_XML = """
<body>
<tag>text</tag>
<tag />
<section>
<tag>subtext</tag>
</section>
</body>
"""
SAMPLE_XML_NS = """
<body xmlns="http://effbot.org/ns">
<tag>text</tag>
<tag />
<section>
<tag>subtext</tag>
</section>
</body>
"""
# cElementTree specific tests
def sanity():
"""
......@@ -34,187 +14,26 @@ def sanity():
>>> from xml.etree import cElementTree
"""
def check_method(method):
if not hasattr(method, '__call__'):
print(method, "not callable")
def serialize(ET, elem):
import io
file = io.StringIO()
tree = ET.ElementTree(elem)
tree.write(file)
return file.getvalue()
def summarize(elem):
return elem.tag
def summarize_list(seq):
return list(map(summarize, seq))
def interface():
"""
Test element tree interface.
>>> element = ET.Element("tag", key="value")
>>> tree = ET.ElementTree(element)
Make sure all standard element methods exist.
>>> check_method(element.append)
>>> check_method(element.insert)
>>> check_method(element.remove)
>>> check_method(element.getchildren)
>>> check_method(element.find)
>>> check_method(element.findall)
>>> check_method(element.findtext)
>>> check_method(element.clear)
>>> check_method(element.get)
>>> check_method(element.set)
>>> check_method(element.keys)
>>> check_method(element.items)
>>> check_method(element.getiterator)
Basic method sanity checks.
>>> serialize(ET, element) # 1
'<tag key="value" />'
>>> subelement = ET.Element("subtag")
>>> element.append(subelement)
>>> serialize(ET, element) # 2
'<tag key="value"><subtag /></tag>'
>>> element.insert(0, subelement)
>>> serialize(ET, element) # 3
'<tag key="value"><subtag /><subtag /></tag>'
>>> element.remove(subelement)
>>> serialize(ET, element) # 4
'<tag key="value"><subtag /></tag>'
>>> element.remove(subelement)
>>> serialize(ET, element) # 5
'<tag key="value" />'
>>> element.remove(subelement)
Traceback (most recent call last):
ValueError: list.remove(x): x not in list
>>> serialize(ET, element) # 6
'<tag key="value" />'
"""
def find():
"""
Test find methods (including xpath syntax).
>>> elem = ET.XML(SAMPLE_XML)
>>> elem.find("tag").tag
'tag'
>>> ET.ElementTree(elem).find("tag").tag
'tag'
>>> elem.find("section/tag").tag
'tag'
>>> ET.ElementTree(elem).find("section/tag").tag
'tag'
>>> elem.findtext("tag")
'text'
>>> elem.findtext("tog")
>>> elem.findtext("tog", "default")
'default'
>>> ET.ElementTree(elem).findtext("tag")
'text'
>>> elem.findtext("section/tag")
'subtext'
>>> ET.ElementTree(elem).findtext("section/tag")
'subtext'
>>> summarize_list(elem.findall("tag"))
['tag', 'tag']
>>> summarize_list(elem.findall("*"))
['tag', 'tag', 'section']
>>> summarize_list(elem.findall(".//tag"))
['tag', 'tag', 'tag']
>>> summarize_list(elem.findall("section/tag"))
['tag']
>>> summarize_list(elem.findall("section//tag"))
['tag']
>>> summarize_list(elem.findall("section/*"))
['tag']
>>> summarize_list(elem.findall("section//*"))
['tag']
>>> summarize_list(elem.findall("section/.//*"))
['tag']
>>> summarize_list(elem.findall("*/*"))
['tag']
>>> summarize_list(elem.findall("*//*"))
['tag']
>>> summarize_list(elem.findall("*/tag"))
['tag']
>>> summarize_list(elem.findall("*/./tag"))
['tag']
>>> summarize_list(elem.findall("./tag"))
['tag', 'tag']
>>> summarize_list(elem.findall(".//tag"))
['tag', 'tag', 'tag']
>>> summarize_list(elem.findall("././tag"))
['tag', 'tag']
>>> summarize_list(ET.ElementTree(elem).findall("/tag"))
['tag', 'tag']
>>> summarize_list(ET.ElementTree(elem).findall("./tag"))
['tag', 'tag']
>>> elem = ET.XML(SAMPLE_XML_NS)
>>> summarize_list(elem.findall("tag"))
[]
>>> summarize_list(elem.findall("{http://effbot.org/ns}tag"))
['{http://effbot.org/ns}tag', '{http://effbot.org/ns}tag']
>>> summarize_list(elem.findall(".//{http://effbot.org/ns}tag"))
['{http://effbot.org/ns}tag', '{http://effbot.org/ns}tag', '{http://effbot.org/ns}tag']
"""
def parseliteral():
r"""
>>> element = ET.XML("<html><body>text</body></html>")
>>> ET.ElementTree(element).write(sys.stdout)
<html><body>text</body></html>
>>> element = ET.fromstring("<html><body>text</body></html>")
>>> ET.ElementTree(element).write(sys.stdout)
<html><body>text</body></html>
>>> print(ET.tostring(element))
<html><body>text</body></html>
>>> print(repr(ET.tostring(element, "ascii")))
b"<?xml version='1.0' encoding='ascii'?>\n<html><body>text</body></html>"
>>> _, ids = ET.XMLID("<html><body>text</body></html>")
>>> len(ids)
0
>>> _, ids = ET.XMLID("<html><body id='body'>text</body></html>")
>>> len(ids)
1
>>> ids["body"].tag
'body'
"""
def check_encoding(encoding):
"""
>>> check_encoding("ascii")
>>> check_encoding("us-ascii")
>>> check_encoding("iso-8859-1")
>>> check_encoding("iso-8859-15")
>>> check_encoding("cp437")
>>> check_encoding("mac-roman")
"""
ET.XML(
"<?xml version='1.0' encoding='%s'?><xml />" % encoding
)
def bug_1534630():
"""
>>> bob = ET.TreeBuilder()
>>> e = bob.data("data")
>>> e = bob.start("tag", {})
>>> e = bob.end("tag")
>>> e = bob.close()
>>> serialize(ET, e)
'<tag />'
"""
def test_main():
from test import test_xml_etree_c
from test import test_xml_etree, test_xml_etree_c
# Run the tests specific to the C implementation
support.run_doctest(test_xml_etree_c, verbosity=True)
# Assign the C implementation before running the doctests
# Patch the __name__, to prevent confusion with the pure Python test
pyET = test_xml_etree.ET
py__name__ = test_xml_etree.__name__
test_xml_etree.ET = cET
if __name__ != '__main__':
test_xml_etree.__name__ = __name__
try:
# Run the same test suite as xml.etree.ElementTree
test_xml_etree.test_main(module_name='xml.etree.cElementTree')
finally:
test_xml_etree.ET = pyET
test_xml_etree.__name__ = py__name__
if __name__ == '__main__':
test_main()
<?pi data?>
<!-- comment -->
<root xmlns='namespace'>
<element key='value'>text</element>
<element>text</element>tail
<empty-element/>
</root>
<!-- comment -->
<root>
<element key='value'>text</element>
<element>text</element>tail
<empty-element/>
</root>
#
# ElementTree
# $Id: ElementInclude.py 1862 2004-06-18 07:31:02Z Fredrik $
# $Id: ElementInclude.py 3375 2008-02-13 08:05:08Z fredrik $
#
# limited xinclude support for element trees
#
......@@ -16,7 +16,7 @@
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2004 by Fredrik Lundh
# Copyright (c) 1999-2008 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
......@@ -42,7 +42,7 @@
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/2.4/license for licensing details.
# See http://www.python.org/psf/license for licensing details.
##
# Limited XInclude support for the ElementTree package.
......
#
# ElementTree
# $Id: ElementPath.py 1858 2004-06-17 21:31:41Z Fredrik $
# $Id: ElementPath.py 3375 2008-02-13 08:05:08Z fredrik $
#
# limited xpath support for element trees
#
......@@ -8,8 +8,13 @@
# 2003-05-23 fl created
# 2003-05-28 fl added support for // etc
# 2003-08-27 fl fixed parsing of periods in element names
# 2007-09-10 fl new selection engine
# 2007-09-12 fl fixed parent selector
# 2007-09-13 fl added iterfind; changed findall to return a list
# 2007-11-30 fl added namespaces support
# 2009-10-30 fl added child element value filter
#
# Copyright (c) 2003-2004 by Fredrik Lundh. All rights reserved.
# Copyright (c) 2003-2009 by Fredrik Lundh. All rights reserved.
#
# fredrik@pythonware.com
# http://www.pythonware.com
......@@ -17,7 +22,7 @@
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2004 by Fredrik Lundh
# Copyright (c) 1999-2009 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
......@@ -43,7 +48,7 @@
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/2.4/license for licensing details.
# See http://www.python.org/psf/license for licensing details.
##
# Implementation module for XPath support. There's usually no reason
......@@ -53,146 +58,246 @@
import re
xpath_tokenizer = re.compile(
"(::|\.\.|\(\)|[/.*:\[\]\(\)@=])|((?:\{[^}]+\})?[^/:\[\]\(\)@=\s]+)|\s+"
).findall
xpath_tokenizer_re = re.compile(
"("
"'[^']*'|\"[^\"]*\"|"
"::|"
"//?|"
"\.\.|"
"\(\)|"
"[/.*:\[\]\(\)@=])|"
"((?:\{[^}]+\})?[^/\[\]\(\)@=\s]+)|"
"\s+"
)
class xpath_descendant_or_self:
pass
def xpath_tokenizer(pattern, namespaces=None):
for token in xpath_tokenizer_re.findall(pattern):
tag = token[1]
if tag and tag[0] != "{" and ":" in tag:
try:
prefix, uri = tag.split(":", 1)
if not namespaces:
raise KeyError
yield token[0], "{%s}%s" % (namespaces[prefix], uri)
except KeyError:
raise SyntaxError("prefix %r not found in prefix map" % prefix)
else:
yield token
##
# Wrapper for a compiled XPath.
def get_parent_map(context):
parent_map = context.parent_map
if parent_map is None:
context.parent_map = parent_map = {}
for p in context.root.iter():
for e in p:
parent_map[e] = p
return parent_map
class Path:
def prepare_child(next, token):
tag = token[1]
def select(context, result):
for elem in result:
for e in elem:
if e.tag == tag:
yield e
return select
##
# Create an Path instance from an XPath expression.
def prepare_star(next, token):
def select(context, result):
for elem in result:
for e in elem:
yield e
return select
def __init__(self, path):
tokens = xpath_tokenizer(path)
# the current version supports 'path/path'-style expressions only
self.path = []
self.tag = None
if tokens and tokens[0][0] == "/":
raise SyntaxError("cannot use absolute path on element")
while tokens:
op, tag = tokens.pop(0)
if tag or op == "*":
self.path.append(tag or op)
elif op == ".":
pass
elif op == "/":
self.path.append(xpath_descendant_or_self())
continue
else:
raise SyntaxError("unsupported path syntax (%s)" % op)
if tokens:
op, tag = tokens.pop(0)
if op != "/":
raise SyntaxError(
"expected path separator (%s)" % (op or tag)
)
if self.path and isinstance(self.path[-1], xpath_descendant_or_self):
raise SyntaxError("path cannot end with //")
if len(self.path) == 1 and isinstance(self.path[0], type("")):
self.tag = self.path[0]
##
# Find first matching object.
def find(self, element):
tag = self.tag
if tag is None:
nodeset = self.findall(element)
if not nodeset:
return None
return nodeset[0]
for elem in element:
if elem.tag == tag:
return elem
return None
def prepare_self(next, token):
def select(context, result):
for elem in result:
yield elem
return select
##
# Find text for first matching object.
def findtext(self, element, default=None):
tag = self.tag
if tag is None:
nodeset = self.findall(element)
if not nodeset:
return default
return nodeset[0].text or ""
for elem in element:
if elem.tag == tag:
return elem.text or ""
return default
def prepare_descendant(next, token):
token = next()
if token[0] == "*":
tag = "*"
elif not token[0]:
tag = token[1]
else:
raise SyntaxError("invalid descendant")
def select(context, result):
for elem in result:
for e in elem.iter(tag):
if e is not elem:
yield e
return select
##
# Find all matching objects.
def prepare_parent(next, token):
def select(context, result):
# FIXME: raise error if .. is applied at toplevel?
parent_map = get_parent_map(context)
result_map = {}
for elem in result:
if elem in parent_map:
parent = parent_map[elem]
if parent not in result_map:
result_map[parent] = None
yield parent
return select
def findall(self, element):
nodeset = [element]
index = 0
def prepare_predicate(next, token):
# FIXME: replace with real parser!!! refs:
# http://effbot.org/zone/simple-iterator-parser.htm
# http://javascript.crockford.com/tdop/tdop.html
signature = []
predicate = []
while 1:
try:
path = self.path[index]
index = index + 1
except IndexError:
return nodeset
set = []
if isinstance(path, xpath_descendant_or_self):
try:
tag = self.path[index]
if not isinstance(tag, type("")):
tag = None
else:
index = index + 1
except IndexError:
tag = None # invalid path
for node in nodeset:
new = list(node.getiterator(tag))
if new and new[0] is node:
set.extend(new[1:])
token = next()
if token[0] == "]":
break
if token[0] and token[0][:1] in "'\"":
token = "'", token[0][1:-1]
signature.append(token[0] or "-")
predicate.append(token[1])
signature = "".join(signature)
# use signature to determine predicate type
if signature == "@-":
# [@attribute] predicate
key = predicate[1]
def select(context, result):
for elem in result:
if elem.get(key) is not None:
yield elem
return select
if signature == "@-='":
# [@attribute='value']
key = predicate[1]
value = predicate[-1]
def select(context, result):
for elem in result:
if elem.get(key) == value:
yield elem
return select
if signature == "-" and not re.match("\d+$", predicate[0]):
# [tag]
tag = predicate[0]
def select(context, result):
for elem in result:
if elem.find(tag) is not None:
yield elem
return select
if signature == "-='" and not re.match("\d+$", predicate[0]):
# [tag='value']
tag = predicate[0]
value = predicate[-1]
def select(context, result):
for elem in result:
for e in elem.findall(tag):
if "".join(e.itertext()) == value:
yield elem
break
return select
if signature == "-" or signature == "-()" or signature == "-()-":
# [index] or [last()] or [last()-index]
if signature == "-":
index = int(predicate[0]) - 1
else:
set.extend(new)
if predicate[0] != "last":
raise SyntaxError("unsupported function")
if signature == "-()-":
try:
index = int(predicate[2]) - 1
except ValueError:
raise SyntaxError("unsupported expression")
else:
for node in nodeset:
for node in node:
if path == "*" or node.tag == path:
set.append(node)
if not set:
return []
nodeset = set
index = -1
def select(context, result):
parent_map = get_parent_map(context)
for elem in result:
try:
parent = parent_map[elem]
# FIXME: what if the selector is "*" ?
elems = list(parent.findall(elem.tag))
if elems[index] is elem:
yield elem
except (IndexError, KeyError):
pass
return select
raise SyntaxError("invalid predicate")
ops = {
"": prepare_child,
"*": prepare_star,
".": prepare_self,
"..": prepare_parent,
"//": prepare_descendant,
"[": prepare_predicate,
}
_cache = {}
class _SelectorContext:
parent_map = None
def __init__(self, root):
self.root = root
# --------------------------------------------------------------------
##
# (Internal) Compile path.
def _compile(path):
p = _cache.get(path)
if p is not None:
return p
p = Path(path)
if len(_cache) >= 100:
# Generate all matching objects.
def iterfind(elem, path, namespaces=None):
# compile selector pattern
if path[-1:] == "/":
path = path + "*" # implicit all (FIXME: keep this?)
try:
selector = _cache[path]
except KeyError:
if len(_cache) > 100:
_cache.clear()
_cache[path] = p
return p
if path[:1] == "/":
raise SyntaxError("cannot use absolute path on element")
next = iter(xpath_tokenizer(path, namespaces)).__next__
token = next()
selector = []
while 1:
try:
selector.append(ops[token[0]](next, token))
except StopIteration:
raise SyntaxError("invalid path")
try:
token = next()
if token[0] == "/":
token = next()
except StopIteration:
break
_cache[path] = selector
# execute selector pattern
result = [elem]
context = _SelectorContext(elem)
for select in selector:
result = select(context, result)
return result
##
# Find first matching object.
def find(element, path):
return _compile(path).find(element)
def find(elem, path, namespaces=None):
try:
return next(iterfind(elem, path, namespaces))
except StopIteration:
return None
##
# Find text for first matching object.
# Find all matching objects.
def findtext(element, path, default=None):
return _compile(path).findtext(element, default)
def findall(elem, path, namespaces=None):
return list(iterfind(elem, path, namespaces))
##
# Find all matching objects.
# Find text for first matching object.
def findall(element, path):
return _compile(path).findall(element)
def findtext(elem, path, default=None, namespaces=None):
try:
elem = next(iterfind(elem, path, namespaces))
return elem.text or ""
except StopIteration:
return default
#
# ElementTree
# $Id: ElementTree.py 2326 2005-03-17 07:45:21Z fredrik $
# $Id: ElementTree.py 3440 2008-07-18 14:45:01Z fredrik $
#
# light-weight XML support for Python 1.5.2 and later.
# light-weight XML support for Python 2.3 and later.
#
# history:
# 2001-10-20 fl created (from various sources)
# 2001-11-01 fl return root from parse method
# 2002-02-16 fl sort attributes in lexical order
# 2002-04-06 fl TreeBuilder refactoring, added PythonDoc markup
# 2002-05-01 fl finished TreeBuilder refactoring
# 2002-07-14 fl added basic namespace support to ElementTree.write
# 2002-07-25 fl added QName attribute support
# 2002-10-20 fl fixed encoding in write
# 2002-11-24 fl changed default encoding to ascii; fixed attribute encoding
# 2002-11-27 fl accept file objects or file names for parse/write
# 2002-12-04 fl moved XMLTreeBuilder back to this module
# 2003-01-11 fl fixed entity encoding glitch for us-ascii
# 2003-02-13 fl added XML literal factory
# 2003-02-21 fl added ProcessingInstruction/PI factory
# 2003-05-11 fl added tostring/fromstring helpers
# 2003-05-26 fl added ElementPath support
# 2003-07-05 fl added makeelement factory method
# 2003-07-28 fl added more well-known namespace prefixes
# 2003-08-15 fl fixed typo in ElementTree.findtext (Thomas Dartsch)
# 2003-09-04 fl fall back on emulator if ElementPath is not installed
# 2003-10-31 fl markup updates
# 2003-11-15 fl fixed nested namespace bug
# 2004-03-28 fl added XMLID helper
# 2004-06-02 fl added default support to findtext
# 2004-06-08 fl fixed encoding of non-ascii element/attribute names
# 2004-08-23 fl take advantage of post-2.1 expat features
# 2005-02-01 fl added iterparse implementation
# 2005-03-02 fl fixed iterparse support for pre-2.2 versions
# history (since 1.2.6):
# 2005-11-12 fl added tostringlist/fromstringlist helpers
# 2006-07-05 fl merged in selected changes from the 1.3 sandbox
# 2006-07-05 fl removed support for 2.1 and earlier
# 2007-06-21 fl added deprecation/future warnings
# 2007-08-25 fl added doctype hook, added parser version attribute etc
# 2007-08-26 fl added new serializer code (better namespace handling, etc)
# 2007-08-27 fl warn for broken /tag searches on tree level
# 2007-09-02 fl added html/text methods to serializer (experimental)
# 2007-09-05 fl added method argument to tostring/tostringlist
# 2007-09-06 fl improved error handling
# 2007-09-13 fl added itertext, iterfind; assorted cleanups
# 2007-12-15 fl added C14N hooks, copy method (experimental)
#
# Copyright (c) 1999-2005 by Fredrik Lundh. All rights reserved.
# Copyright (c) 1999-2008 by Fredrik Lundh. All rights reserved.
#
# fredrik@pythonware.com
# http://www.pythonware.com
......@@ -42,7 +26,7 @@
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2005 by Fredrik Lundh
# Copyright (c) 1999-2008 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
......@@ -68,25 +52,28 @@
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/2.4/license for licensing details.
# See http://www.python.org/psf/license for licensing details.
__all__ = [
# public symbols
"Comment",
"dump",
"Element", "ElementTree",
"fromstring",
"fromstring", "fromstringlist",
"iselement", "iterparse",
"parse",
"parse", "ParseError",
"PI", "ProcessingInstruction",
"QName",
"SubElement",
"tostring",
"tostring", "tostringlist",
"TreeBuilder",
"VERSION", "XML",
"VERSION",
"XML",
"XMLParser", "XMLTreeBuilder",
]
VERSION = "1.3.0"
##
# The <b>Element</b> type is a flexible container object, designed to
# store hierarchical data structures in memory. The type can be
......@@ -102,36 +89,86 @@ __all__ = [
# <li>a number of <i>child elements</i>, stored in a Python sequence</li>
# </ul>
#
# To create an element instance, use the {@link #Element} or {@link
# #SubElement} factory functions.
# To create an element instance, use the {@link #Element} constructor
# or the {@link #SubElement} factory function.
# <p>
# The {@link #ElementTree} class can be used to wrap an element
# structure, and convert it from and to XML.
##
import sys, re
import sys
import re
import warnings
class _SimpleElementPath:
# emulate pre-1.2 find/findtext/findall behaviour
def find(self, element, tag, namespaces=None):
for elem in element:
if elem.tag == tag:
return elem
return None
def findtext(self, element, tag, default=None, namespaces=None):
elem = self.find(element, tag)
if elem is None:
return default
return elem.text or ""
def iterfind(self, element, tag, namespaces=None):
if tag[:3] == ".//":
for elem in element.iter(tag[3:]):
yield elem
for elem in element:
if elem.tag == tag:
yield elem
def findall(self, element, tag, namespaces=None):
return list(self.iterfind(element, tag, namespaces))
try:
from . import ElementPath
except ImportError:
ElementPath = _SimpleElementPath()
##
# Parser error. This is a subclass of <b>SyntaxError</b>.
# <p>
# In addition to the exception value, an exception instance contains a
# specific exception code in the <b>code</b> attribute, and the line and
# column of the error in the <b>position</b> attribute.
class ParseError(SyntaxError):
pass
from . import ElementPath
# --------------------------------------------------------------------
# TODO: add support for custom namespace resolvers/default namespaces
# TODO: add improved support for incremental parsing
##
# Checks if an object appears to be a valid element object.
#
# @param An element instance.
# @return A true value if this is an element object.
# @defreturn flag
VERSION = "1.2.6"
def iselement(element):
# FIXME: not sure about this; might be a better idea to look
# for tag/attrib/text attributes
return isinstance(element, Element) or hasattr(element, "tag")
##
# Internal element class. This class defines the Element interface,
# and provides a reference implementation of this interface.
# Element class. This class defines the Element interface, and
# provides a reference implementation of this interface.
# <p>
# You should not create instances of this class directly. Use the
# appropriate factory functions instead, such as {@link #Element}
# and {@link #SubElement}.
# The element name, attribute names, and attribute values can be
# either ASCII strings (ordinary Python strings containing only 7-bit
# ASCII characters) or Unicode strings.
#
# @param tag The element name.
# @param attrib An optional dictionary, containing element attributes.
# @param **extra Additional attributes, given as keyword arguments.
# @see Element
# @see SubElement
# @see Comment
# @see ProcessingInstruction
class _ElementInterface:
class Element:
# <tag attrib>text<child/>...</tag>tail
##
......@@ -141,34 +178,41 @@ class _ElementInterface:
##
# (Attribute) Element attribute dictionary. Where possible, use
# {@link #_ElementInterface.get},
# {@link #_ElementInterface.set},
# {@link #_ElementInterface.keys}, and
# {@link #_ElementInterface.items} to access
# {@link #Element.get},
# {@link #Element.set},
# {@link #Element.keys}, and
# {@link #Element.items} to access
# element attributes.
attrib = None
##
# (Attribute) Text before first subelement. This is either a
# string or the value None, if there was no text.
# string or the value None. Note that if there was no text, this
# attribute may be either None or an empty string, depending on
# the parser.
text = None
##
# (Attribute) Text after this element's end tag, but before the
# next sibling element's start tag. This is either a string or
# the value None, if there was no text.
# the value None. Note that if there was no text, this attribute
# may be either None or an empty string, depending on the parser.
tail = None # text after end tag, if any
def __init__(self, tag, attrib):
# constructor
def __init__(self, tag, attrib={}, **extra):
attrib = attrib.copy()
attrib.update(extra)
self.tag = tag
self.attrib = attrib
self._children = []
def __repr__(self):
return "<Element %s at %x>" % (self.tag, id(self))
return "<Element %s at 0x%x>" % (repr(self.tag), id(self))
##
# Creates a new element object of the same type as this element.
......@@ -178,18 +222,41 @@ class _ElementInterface:
# @return A new element instance.
def makeelement(self, tag, attrib):
return Element(tag, attrib)
return self.__class__(tag, attrib)
##
# Returns the number of subelements.
# (Experimental) Copies the current element. This creates a
# shallow copy; subelements will be shared with the original tree.
#
# @return A new element instance.
def copy(self):
elem = self.makeelement(self.tag, self.attrib)
elem.text = self.text
elem.tail = self.tail
elem[:] = self
return elem
##
# Returns the number of subelements. Note that this only counts
# full elements; to check if there's any content in an element, you
# have to check both the length and the <b>text</b> attribute.
#
# @return The number of subelements.
def __len__(self):
return len(self._children)
def __bool__(self):
warnings.warn(
"The behavior of this method will change in future versions. "
"Use specific 'len(elem)' or 'elem is not None' test instead.",
FutureWarning, stacklevel=2
)
return len(self._children) != 0 # emulate old behaviour, for now
##
# Returns the given subelement.
# Returns the given subelement, by index.
#
# @param index What subelement to return.
# @return The given subelement.
......@@ -199,19 +266,22 @@ class _ElementInterface:
return self._children[index]
##
# Replaces the given subelement.
# Replaces the given subelement, by index.
#
# @param index What subelement to replace.
# @param element The new element value.
# @exception IndexError If the given element does not exist.
# @exception AssertionError If element is not a valid object.
def __setitem__(self, index, element):
assert iselement(element)
# if isinstance(index, slice):
# for elt in element:
# assert iselement(elt)
# else:
# assert iselement(element)
self._children[index] = element
##
# Deletes the given subelement.
# Deletes the given subelement, by index.
#
# @param index What subelement to delete.
# @exception IndexError If the given element does not exist.
......@@ -220,118 +290,121 @@ class _ElementInterface:
del self._children[index]
##
# Returns a list containing subelements in the given range.
# Adds a subelement to the end of this element. In document order,
# the new element will appear after the last existing subelement (or
# directly after the text, if it's the first subelement), but before
# the end tag for this element.
#
# @param start The first subelement to return.
# @param stop The first subelement that shouldn't be returned.
# @return A sequence object containing subelements.
# @param element The element to add.
def __getslice__(self, start, stop):
return self._children[start:stop]
def append(self, element):
# assert iselement(element)
self._children.append(element)
##
# Replaces a number of subelements with elements from a sequence.
# Appends subelements from a sequence.
#
# @param start The first subelement to replace.
# @param stop The first subelement that shouldn't be replaced.
# @param elements A sequence object with zero or more elements.
# @exception AssertionError If a sequence member is not a valid object.
def __setslice__(self, start, stop, elements):
for element in elements:
assert iselement(element)
self._children[start:stop] = list(elements)
##
# Deletes a number of subelements.
#
# @param start The first subelement to delete.
# @param stop The first subelement to leave in there.
def __delslice__(self, start, stop):
del self._children[start:stop]
##
# Adds a subelement to the end of this element.
#
# @param element The element to add.
# @exception AssertionError If a sequence member is not a valid object.
# @since 1.3
def append(self, element):
assert iselement(element)
self._children.append(element)
def extend(self, elements):
# for element in elements:
# assert iselement(element)
self._children.extend(elements)
##
# Inserts a subelement at the given position in this element.
#
# @param index Where to insert the new subelement.
# @exception AssertionError If the element is not a valid object.
def insert(self, index, element):
assert iselement(element)
# assert iselement(element)
self._children.insert(index, element)
##
# Removes a matching subelement. Unlike the <b>find</b> methods,
# this method compares elements based on identity, not on tag
# value or contents.
# value or contents. To remove subelements by other means, the
# easiest way is often to use a list comprehension to select what
# elements to keep, and use slice assignment to update the parent
# element.
#
# @param element What element to remove.
# @exception ValueError If a matching element could not be found.
# @exception AssertionError If the element is not a valid object.
def remove(self, element):
assert iselement(element)
# assert iselement(element)
self._children.remove(element)
##
# Returns all subelements. The elements are returned in document
# order.
# (Deprecated) Returns all subelements. The elements are returned
# in document order.
#
# @return A list of subelements.
# @defreturn list of Element instances
def getchildren(self):
warnings.warn(
"This method will be removed in future versions. "
"Use 'list(elem)' or iteration over elem instead.",
DeprecationWarning, stacklevel=2
)
return self._children
##
# Finds the first matching subelement, by tag name or path.
#
# @param path What element to look for.
# @keyparam namespaces Optional namespace prefix map.
# @return The first matching element, or None if no element was found.
# @defreturn Element or None
def find(self, path):
return ElementPath.find(self, path)
def find(self, path, namespaces=None):
return ElementPath.find(self, path, namespaces)
##
# Finds text for the first matching subelement, by tag name or path.
#
# @param path What element to look for.
# @param default What to return if the element was not found.
# @keyparam namespaces Optional namespace prefix map.
# @return The text content of the first matching element, or the
# default value no element was found. Note that if the element
# has is found, but has no text content, this method returns an
# is found, but has no text content, this method returns an
# empty string.
# @defreturn string
def findtext(self, path, default=None):
return ElementPath.findtext(self, path, default)
def findtext(self, path, default=None, namespaces=None):
return ElementPath.findtext(self, path, default, namespaces)
##
# Finds all matching subelements, by tag name or path.
#
# @param path What element to look for.
# @return A list or iterator containing all matching elements,
# @keyparam namespaces Optional namespace prefix map.
# @return A list or other sequence containing all matching elements,
# in document order.
# @defreturn list of Element instances
def findall(self, path):
return ElementPath.findall(self, path)
def findall(self, path, namespaces=None):
return ElementPath.findall(self, path, namespaces)
##
# Finds all matching subelements, by tag name or path.
#
# @param path What element to look for.
# @keyparam namespaces Optional namespace prefix map.
# @return An iterator or sequence containing all matching elements,
# in document order.
# @defreturn a generated sequence of Element instances
def iterfind(self, path, namespaces=None):
return ElementPath.iterfind(self, path, namespaces)
##
# Resets an element. This function removes all subelements, clears
# all attributes, and sets the text and tail attributes to None.
# all attributes, and sets the <b>text</b> and <b>tail</b> attributes
# to None.
def clear(self):
self.attrib.clear()
......@@ -339,7 +412,8 @@ class _ElementInterface:
self.text = self.tail = None
##
# Gets an element attribute.
# Gets an element attribute. Equivalent to <b>attrib.get</b>, but
# some implementations may handle this a bit more efficiently.
#
# @param key What attribute to look for.
# @param default What to return if the attribute was not found.
......@@ -351,7 +425,8 @@ class _ElementInterface:
return self.attrib.get(key, default)
##
# Sets an element attribute.
# Sets an element attribute. Equivalent to <b>attrib[key] = value</b>,
# but some implementations may handle this a bit more efficiently.
#
# @param key What attribute to set.
# @param value The attribute value.
......@@ -362,6 +437,7 @@ class _ElementInterface:
##
# Gets a list of attribute names. The names are returned in an
# arbitrary order (just like for an ordinary Python dictionary).
# Equivalent to <b>attrib.keys()</b>.
#
# @return A list of element attribute names.
# @defreturn list of strings
......@@ -371,7 +447,7 @@ class _ElementInterface:
##
# Gets element attributes, as a sequence. The attributes are
# returned in an arbitrary order.
# returned in an arbitrary order. Equivalent to <b>attrib.items()</b>.
#
# @return A list of (name, value) tuples for all attributes.
# @defreturn list of (string, string) tuples
......@@ -384,45 +460,55 @@ class _ElementInterface:
# and all subelements, in document order, and returns all elements
# with a matching tag.
# <p>
# If the tree structure is modified during iteration, the result
# is undefined.
# If the tree structure is modified during iteration, new or removed
# elements may or may not be included. To get a stable set, use the
# list() function on the iterator, and loop over the resulting list.
#
# @param tag What tags to look for (default is to return all elements).
# @return A list or iterator containing all the matching elements.
# @defreturn list or iterator
# @return An iterator containing all the matching elements.
# @defreturn iterator
def getiterator(self, tag=None):
nodes = []
def iter(self, tag=None):
if tag == "*":
tag = None
if tag is None or self.tag == tag:
nodes.append(self)
for node in self._children:
nodes.extend(node.getiterator(tag))
return nodes
yield self
for e in self._children:
for e in e.iter(tag):
yield e
# compatibility
_Element = _ElementInterface
# compatibility
def getiterator(self, tag=None):
# Change for a DeprecationWarning in 1.4
warnings.warn(
"This method will be removed in future versions. "
"Use 'elem.iter()' or 'list(elem.iter())' instead.",
PendingDeprecationWarning, stacklevel=2
)
return list(self.iter(tag))
##
# Element factory. This function returns an object implementing the
# standard Element interface. The exact class or type of that object
# is implementation dependent, but it will always be compatible with
# the {@link #_ElementInterface} class in this module.
# <p>
# The element name, attribute names, and attribute values can be
# either 8-bit ASCII strings or Unicode strings.
#
# @param tag The element name.
# @param attrib An optional dictionary, containing element attributes.
# @param **extra Additional attributes, given as keyword arguments.
# @return An element instance.
# @defreturn Element
##
# Creates a text iterator. The iterator loops over this element
# and all subelements, in document order, and returns all inner
# text.
#
# @return An iterator containing all inner text.
# @defreturn iterator
def Element(tag, attrib={}, **extra):
attrib = attrib.copy()
attrib.update(extra)
return _ElementInterface(tag, attrib)
def itertext(self):
tag = self.tag
if not isinstance(tag, str) and tag is not None:
return
if self.text:
yield self.text
for e in self:
for s in e.itertext():
yield s
if e.tail:
yield e.tail
# compatibility
_Element = _ElementInterface = Element
##
# Subelement factory. This function creates an element instance, and
......@@ -447,7 +533,8 @@ def SubElement(parent, tag, attrib={}, **extra):
##
# Comment element factory. This factory function creates a special
# element that will be serialized as an XML comment.
# element that will be serialized as an XML comment by the standard
# serializer.
# <p>
# The comment string can be either an 8-bit ASCII string or a Unicode
# string.
......@@ -463,7 +550,8 @@ def Comment(text=None):
##
# PI element factory. This factory function creates a special element
# that will be serialized as an XML processing instruction.
# that will be serialized as an XML processing instruction by the standard
# serializer.
#
# @param target A string containing the PI target.
# @param text A string containing the PI contents, if any.
......@@ -523,19 +611,21 @@ class QName:
return self.text != other.text
return self.text != other
# --------------------------------------------------------------------
##
# ElementTree wrapper class. This class represents an entire element
# hierarchy, and adds some extra support for serialization to and from
# standard XML.
#
# @param element Optional root element.
# @keyparam file Optional file handle or name. If given, the
# @keyparam file Optional file handle or file name. If given, the
# tree is initialized with the contents of this XML file.
class ElementTree:
def __init__(self, element=None, file=None):
assert element is None or iselement(element)
# assert element is None or iselement(element)
self._root = element # first node
if file:
self.parse(file)
......@@ -557,25 +647,27 @@ class ElementTree:
# @param element An element instance.
def _setroot(self, element):
assert iselement(element)
# assert iselement(element)
self._root = element
##
# Loads an external XML document into this element tree.
#
# @param source A file name or file object.
# @param parser An optional parser instance. If not given, the
# standard {@link XMLTreeBuilder} parser is used.
# @param source A file name or file object. If a file object is
# given, it only has to implement a <b>read(n)</b> method.
# @keyparam parser An optional parser instance. If not given, the
# standard {@link XMLParser} parser is used.
# @return The document root element.
# @defreturn Element
# @exception ParseError If the parser fails to parse the document.
def parse(self, source, parser=None):
if not hasattr(source, "read"):
source = open(source, "rb")
if not parser:
parser = XMLTreeBuilder()
parser = XMLParser(target=TreeBuilder())
while 1:
data = source.read(32768)
data = source.read(65536)
if not data:
break
parser.feed(data)
......@@ -590,23 +682,40 @@ class ElementTree:
# @return An iterator.
# @defreturn iterator
def iter(self, tag=None):
# assert self._root is not None
return self._root.iter(tag)
# compatibility
def getiterator(self, tag=None):
assert self._root is not None
return self._root.getiterator(tag)
# Change for a DeprecationWarning in 1.4
warnings.warn(
"This method will be removed in future versions. "
"Use 'tree.iter()' or 'list(tree.iter())' instead.",
PendingDeprecationWarning, stacklevel=2
)
return list(self.iter(tag))
##
# Finds the first toplevel element with given tag.
# Same as getroot().find(path).
#
# @param path What element to look for.
# @keyparam namespaces Optional namespace prefix map.
# @return The first matching element, or None if no element was found.
# @defreturn Element or None
def find(self, path):
assert self._root is not None
def find(self, path, namespaces=None):
# assert self._root is not None
if path[:1] == "/":
path = "." + path
return self._root.find(path)
warnings.warn(
"This search is broken in 1.3 and earlier, and will be "
"fixed in a future version. If you rely on the current "
"behaviour, change it to %r" % path,
FutureWarning, stacklevel=2
)
return self._root.find(path, namespaces)
##
# Finds the element text for the first toplevel element with given
......@@ -614,153 +723,353 @@ class ElementTree:
#
# @param path What toplevel element to look for.
# @param default What to return if the element was not found.
# @keyparam namespaces Optional namespace prefix map.
# @return The text content of the first matching element, or the
# default value no element was found. Note that if the element
# has is found, but has no text content, this method returns an
# is found, but has no text content, this method returns an
# empty string.
# @defreturn string
def findtext(self, path, default=None):
assert self._root is not None
def findtext(self, path, default=None, namespaces=None):
# assert self._root is not None
if path[:1] == "/":
path = "." + path
return self._root.findtext(path, default)
warnings.warn(
"This search is broken in 1.3 and earlier, and will be "
"fixed in a future version. If you rely on the current "
"behaviour, change it to %r" % path,
FutureWarning, stacklevel=2
)
return self._root.findtext(path, default, namespaces)
##
# Finds all toplevel elements with the given tag.
# Same as getroot().findall(path).
#
# @param path What element to look for.
# @keyparam namespaces Optional namespace prefix map.
# @return A list or iterator containing all matching elements,
# in document order.
# @defreturn list of Element instances
def findall(self, path):
assert self._root is not None
def findall(self, path, namespaces=None):
# assert self._root is not None
if path[:1] == "/":
path = "." + path
return self._root.findall(path)
warnings.warn(
"This search is broken in 1.3 and earlier, and will be "
"fixed in a future version. If you rely on the current "
"behaviour, change it to %r" % path,
FutureWarning, stacklevel=2
)
return self._root.findall(path, namespaces)
##
# Finds all matching subelements, by tag name or path.
# Same as getroot().iterfind(path).
#
# @param path What element to look for.
# @keyparam namespaces Optional namespace prefix map.
# @return An iterator or sequence containing all matching elements,
# in document order.
# @defreturn a generated sequence of Element instances
def iterfind(self, path, namespaces=None):
# assert self._root is not None
if path[:1] == "/":
path = "." + path
warnings.warn(
"This search is broken in 1.3 and earlier, and will be "
"fixed in a future version. If you rely on the current "
"behaviour, change it to %r" % path,
FutureWarning, stacklevel=2
)
return self._root.iterfind(path, namespaces)
##
# Writes the element tree to a file, as XML.
#
# @def write(file, **options)
# @param file A file name, or a file object opened for writing.
# @param encoding Optional output encoding (default is None)
def write(self, file, encoding=None):
assert self._root is not None
if not hasattr(file, "write"):
if encoding:
file = open(file, "wb")
# @param **options Options, given as keyword arguments.
# @keyparam encoding Optional output encoding (default is None).
# @keyparam method Optional output method ("xml", "html", "text" or
# "c14n"; default is "xml").
# @keyparam xml_declaration Controls if an XML declaration should
# be added to the file. Use False for never, True for always,
# None for only if not US-ASCII or UTF-8. None is default.
def write(self, file_or_filename,
# keyword arguments
encoding=None,
xml_declaration=None,
default_namespace=None,
method=None):
# assert self._root is not None
if not method:
method = "xml"
elif method not in _serialize:
# FIXME: raise an ImportError for c14n if ElementC14N is missing?
raise ValueError("unknown method %r" % method)
if hasattr(file_or_filename, "write"):
file = file_or_filename
else:
file = open(file, "w")
if encoding and encoding != "utf-8":
file.write(_encode("<?xml version='1.0' encoding='%s'?>\n" % encoding, encoding))
self._write(file, self._root, encoding, {})
def _write(self, file, node, encoding, namespaces):
# write XML to file
tag = node.tag
if tag is Comment:
file.write(_encode("<!-- %s -->" % node.text, encoding))
elif tag is ProcessingInstruction:
file.write(_encode("<?%s?>" % node.text, encoding))
if encoding:
file = open(file_or_filename, "wb")
else:
items = list(node.items())
xmlns_items = [] # new namespaces in this scope
try:
if isinstance(tag, QName) or tag[:1] == "{":
tag, xmlns = fixtag(tag, namespaces)
if xmlns: xmlns_items.append(xmlns)
except TypeError:
_raise_serialization_error(tag)
file.write(_encode("<" + tag, encoding))
if items or xmlns_items:
items.sort() # lexical order
for k, v in items:
try:
if isinstance(k, QName) or k[:1] == "{":
k, xmlns = fixtag(k, namespaces)
if xmlns: xmlns_items.append(xmlns)
except TypeError:
_raise_serialization_error(k)
file = open(file_or_filename, "w")
if encoding:
def write(text):
try:
if isinstance(v, QName):
v, xmlns = fixtag(v, namespaces)
if xmlns: xmlns_items.append(xmlns)
except TypeError:
_raise_serialization_error(v)
file.write(_encode(" %s=\"%s\"" % (k, _escape_attrib(v)), encoding))
for k, v in xmlns_items:
file.write(_encode(" %s=\"%s\"" % (k, _escape_attrib(v)), encoding))
if node.text or len(node):
file.write(_encode(">", encoding))
if node.text:
file.write(_encode_cdata(node.text, encoding))
for n in node:
self._write(file, n, encoding, namespaces)
file.write(_encode("</" + tag + ">", encoding))
return file.write(text.encode(encoding,
"xmlcharrefreplace"))
except (TypeError, AttributeError):
_raise_serialization_error(text)
else:
write = file.write
if not encoding:
if method == "c14n":
encoding = "utf-8"
else:
encoding = None
elif xml_declaration or (xml_declaration is None and
encoding not in ("utf-8", "us-ascii")):
if method == "xml":
encoding_ = encoding
if not encoding:
# Retrieve the default encoding for the xml declaration
import locale
encoding_ = locale.getpreferredencoding()
write("<?xml version='1.0' encoding='%s'?>\n" % encoding_)
if method == "text":
_serialize_text(write, self._root)
else:
file.write(_encode(" />", encoding))
for k, v in xmlns_items:
del namespaces[v]
if node.tail:
file.write(_encode_cdata(node.tail, encoding))
qnames, namespaces = _namespaces(self._root, default_namespace)
serialize = _serialize[method]
serialize(write, self._root, qnames, namespaces)
if file_or_filename is not file:
file.close()
def write_c14n(self, file):
# lxml.etree compatibility. use output method instead
return self.write(file, method="c14n")
# --------------------------------------------------------------------
# helpers
# serialization support
##
# Checks if an object appears to be a valid element object.
#
# @param An element instance.
# @return A true value if this is an element object.
# @defreturn flag
def _namespaces(elem, default_namespace=None):
# identify namespaces used in this tree
def iselement(element):
# FIXME: not sure about this; might be a better idea to look
# for tag/attrib/text attributes
return isinstance(element, _ElementInterface) or hasattr(element, "tag")
# maps qnames to *encoded* prefix:local names
qnames = {None: None}
##
# Writes an element tree or element structure to sys.stdout. This
# function should be used for debugging only.
# <p>
# The exact output format is implementation dependent. In this
# version, it's written as an ordinary XML file.
#
# @param elem An element tree or an individual element.
# maps uri:s to prefixes
namespaces = {}
if default_namespace:
namespaces[default_namespace] = ""
def dump(elem):
# debugging
if not isinstance(elem, ElementTree):
elem = ElementTree(elem)
elem.write(sys.stdout)
tail = elem.getroot().tail
if not tail or tail[-1] != "\n":
sys.stdout.write("\n")
def add_qname(qname):
# calculate serialized qname representation
try:
if qname[:1] == "{":
uri, tag = qname[1:].rsplit("}", 1)
prefix = namespaces.get(uri)
if prefix is None:
prefix = _namespace_map.get(uri)
if prefix is None:
prefix = "ns%d" % len(namespaces)
if prefix != "xml":
namespaces[uri] = prefix
if prefix:
qnames[qname] = "%s:%s" % (prefix, tag)
else:
qnames[qname] = tag # default element
else:
if default_namespace:
# FIXME: can this be handled in XML 1.0?
raise ValueError(
"cannot use non-qualified names with "
"default_namespace option"
)
qnames[qname] = qname
except TypeError:
_raise_serialization_error(qname)
def _encode(s, encoding):
if encoding:
return s.encode(encoding)
# populate qname and namespaces table
try:
iterate = elem.iter
except AttributeError:
iterate = elem.getiterator # cET compatibility
for elem in iterate():
tag = elem.tag
if isinstance(tag, QName) and tag.text not in qnames:
add_qname(tag.text)
elif isinstance(tag, str):
if tag not in qnames:
add_qname(tag)
elif tag is not None and tag is not Comment and tag is not PI:
_raise_serialization_error(tag)
for key, value in elem.items():
if isinstance(key, QName):
key = key.text
if key not in qnames:
add_qname(key)
if isinstance(value, QName) and value.text not in qnames:
add_qname(value.text)
text = elem.text
if isinstance(text, QName) and text.text not in qnames:
add_qname(text.text)
return qnames, namespaces
def _serialize_xml(write, elem, qnames, namespaces):
tag = elem.tag
text = elem.text
if tag is Comment:
write("<!--%s-->" % text)
elif tag is ProcessingInstruction:
write("<?%s?>" % text)
else:
tag = qnames[tag]
if tag is None:
if text:
write(_escape_cdata(text))
for e in elem:
_serialize_xml(write, e, qnames, None)
else:
write("<" + tag)
items = list(elem.items())
if items or namespaces:
if namespaces:
for v, k in sorted(namespaces.items(),
key=lambda x: x[1]): # sort on prefix
if k:
k = ":" + k
write(" xmlns%s=\"%s\"" % (
k,
_escape_attrib(v)
))
for k, v in sorted(items): # lexical order
if isinstance(k, QName):
k = k.text
if isinstance(v, QName):
v = qnames[v.text]
else:
v = _escape_attrib(v)
write(" %s=\"%s\"" % (qnames[k], v))
if text or len(elem):
write(">")
if text:
write(_escape_cdata(text))
for e in elem:
_serialize_xml(write, e, qnames, None)
write("</" + tag + ">")
else:
return s
write(" />")
if elem.tail:
write(_escape_cdata(elem.tail))
_escape = re.compile(r"[&<>\"\u0080-\uffff]+")
HTML_EMPTY = ("area", "base", "basefont", "br", "col", "frame", "hr",
"img", "input", "isindex", "link", "meta" "param")
try:
HTML_EMPTY = set(HTML_EMPTY)
except NameError:
pass
_escape_map = {
"&": "&amp;",
"<": "&lt;",
">": "&gt;",
'"': "&quot;",
def _serialize_html(write, elem, qnames, namespaces):
tag = elem.tag
text = elem.text
if tag is Comment:
write("<!--%s-->" % _escape_cdata(text))
elif tag is ProcessingInstruction:
write("<?%s?>" % _escape_cdata(text))
else:
tag = qnames[tag]
if tag is None:
if text:
write(_escape_cdata(text))
for e in elem:
_serialize_html(write, e, qnames, None)
else:
write("<" + tag)
items = list(elem.items())
if items or namespaces:
if namespaces:
for v, k in sorted(namespaces.items(),
key=lambda x: x[1]): # sort on prefix
if k:
k = ":" + k
write(" xmlns%s=\"%s\"" % (
k,
_escape_attrib(v)
))
for k, v in sorted(items): # lexical order
if isinstance(k, QName):
k = k.text
if isinstance(v, QName):
v = qnames[v.text]
else:
v = _escape_attrib_html(v)
# FIXME: handle boolean attributes
write(" %s=\"%s\"" % (qnames[k], v))
write(">")
tag = tag.lower()
if text:
if tag == "script" or tag == "style":
write(text)
else:
write(_escape_cdata(text))
for e in elem:
_serialize_html(write, e, qnames, None)
if tag not in HTML_EMPTY:
write("</" + tag + ">")
if elem.tail:
write(_escape_cdata(elem.tail))
def _serialize_text(write, elem):
for part in elem.itertext():
write(part)
if elem.tail:
write(elem.tail)
_serialize = {
"xml": _serialize_xml,
"html": _serialize_html,
"text": _serialize_text,
# this optional method is imported at the end of the module
# "c14n": _serialize_c14n,
}
##
# Registers a namespace prefix. The registry is global, and any
# existing mapping for either the given prefix or the namespace URI
# will be removed.
#
# @param prefix Namespace prefix.
# @param uri Namespace uri. Tags and attributes in this namespace
# will be serialized with the given prefix, if at all possible.
# @exception ValueError If the prefix is reserved, or is otherwise
# invalid.
def register_namespace(prefix, uri):
if re.match("ns\d+$", prefix):
raise ValueError("Prefix format reserved for internal use")
for k, v in _namespace_map.items():
if k == uri or v == prefix:
del _namespace_map[k]
_namespace_map[uri] = prefix
_namespace_map = {
# "well-known" namespace prefixes
"http://www.w3.org/XML/1998/namespace": "xml",
"http://www.w3.org/1999/xhtml": "html",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#": "rdf",
"http://schemas.xmlsoap.org/wsdl/": "wsdl",
# xml schema
"http://www.w3.org/2001/XMLSchema": "xs",
"http://www.w3.org/2001/XMLSchema-instance": "xsi",
# dublin core
"http://purl.org/dc/elements/1.1/": "dc",
}
def _raise_serialization_error(text):
......@@ -768,35 +1077,18 @@ def _raise_serialization_error(text):
"cannot serialize %r (type %s)" % (text, type(text).__name__)
)
def _encode_entity(text, pattern=_escape):
# map reserved and non-ascii characters to numerical entities
def escape_entities(m, map=_escape_map):
out = []
append = out.append
for char in m.group():
text = map.get(char)
if text is None:
text = "&#%d;" % ord(char)
append(text)
return "".join(out)
try:
return _encode(pattern.sub(escape_entities, text), "ascii")
except TypeError:
_raise_serialization_error(text)
#
# the following functions assume an ascii-compatible encoding
# (or "utf-16")
def _encode_cdata(text, encoding):
def _escape_cdata(text):
# escape character data
try:
# it's worth avoiding do-nothing calls for strings that are
# shorter than 500 character, or so. assume that's, by far,
# the most common case in most applications.
if "&" in text:
text = text.replace("&", "&amp;")
if "<" in text:
text = text.replace("<", "&lt;")
if ">" in text:
text = text.replace(">", "&gt;")
if encoding:
return text.encode(encoding, "xmlcharrefreplace")
else:
return text
except (TypeError, AttributeError):
_raise_serialization_error(text)
......@@ -804,41 +1096,108 @@ def _encode_cdata(text, encoding):
def _escape_attrib(text):
# escape attribute value
try:
if "&" in text:
text = text.replace("&", "&amp;")
text = text.replace("'", "&apos;") # FIXME: overkill
text = text.replace("\"", "&quot;")
if "<" in text:
text = text.replace("<", "&lt;")
if ">" in text:
text = text.replace(">", "&gt;")
if "\"" in text:
text = text.replace("\"", "&quot;")
if "\n" in text:
text = text.replace("\n", "&#10;")
return text
except (TypeError, AttributeError):
_raise_serialization_error(text)
def fixtag(tag, namespaces):
# given a decorated tag (of the form {uri}tag), return prefixed
# tag and namespace declaration, if any
if isinstance(tag, QName):
tag = tag.text
namespace_uri, tag = tag[1:].split("}", 1)
prefix = namespaces.get(namespace_uri)
if prefix is None:
prefix = _namespace_map.get(namespace_uri)
if prefix is None:
prefix = "ns%d" % len(namespaces)
namespaces[namespace_uri] = prefix
if prefix == "xml":
xmlns = None
else:
xmlns = ("xmlns:%s" % prefix, namespace_uri)
def _escape_attrib_html(text):
# escape attribute value
try:
if "&" in text:
text = text.replace("&", "&amp;")
if ">" in text:
text = text.replace(">", "&gt;")
if "\"" in text:
text = text.replace("\"", "&quot;")
return text
except (TypeError, AttributeError):
_raise_serialization_error(text)
# --------------------------------------------------------------------
##
# Generates a string representation of an XML element, including all
# subelements. If encoding is None, the return type is a string;
# otherwise it is a bytes array.
#
# @param element An Element instance.
# @keyparam encoding Optional output encoding (default is None).
# @keyparam method Optional output method ("xml", "html", "text" or
# "c14n"; default is "xml").
# @return An (optionally) encoded string containing the XML data.
# @defreturn string
def tostring(element, encoding=None, method=None):
class dummy:
pass
data = []
file = dummy()
file.write = data.append
ElementTree(element).write(file, encoding, method=method)
if encoding:
return b"".join(data)
else:
xmlns = None
return "%s:%s" % (prefix, tag), xmlns
return "".join(data)
##
# Generates a string representation of an XML element, including all
# subelements. The string is returned as a sequence of string fragments.
#
# @param element An Element instance.
# @keyparam encoding Optional output encoding (default is US-ASCII).
# @keyparam method Optional output method ("xml", "html", "text" or
# "c14n"; default is "xml").
# @return A sequence object containing the XML data.
# @defreturn sequence
# @since 1.3
def tostringlist(element, encoding=None, method=None):
class dummy:
pass
data = []
file = dummy()
file.write = data.append
ElementTree(element).write(file, encoding, method=method)
# FIXME: merge small fragments into larger parts
return data
##
# Writes an element tree or element structure to sys.stdout. This
# function should be used for debugging only.
# <p>
# The exact output format is implementation dependent. In this
# version, it's written as an ordinary XML file.
#
# @param elem An element tree or an individual element.
def dump(elem):
# debugging
if not isinstance(elem, ElementTree):
elem = ElementTree(elem)
elem.write(sys.stdout)
tail = elem.getroot().tail
if not tail or tail[-1] != "\n":
sys.stdout.write("\n")
# --------------------------------------------------------------------
# parsing
##
# Parses an XML document into an element tree.
#
# @param source A filename or file object containing XML data.
# @param parser An optional parser instance. If not given, the
# standard {@link XMLTreeBuilder} parser is used.
# standard {@link XMLParser} parser is used.
# @return An ElementTree instance
def parse(source, parser=None):
......@@ -853,18 +1212,25 @@ def parse(source, parser=None):
# @param source A filename or file object containing XML data.
# @param events A list of events to report back. If omitted, only "end"
# events are reported.
# @param parser An optional parser instance. If not given, the
# standard {@link XMLParser} parser is used.
# @return A (event, elem) iterator.
class iterparse:
def __init__(self, source, events=None):
def iterparse(source, events=None, parser=None):
if not hasattr(source, "read"):
source = open(source, "rb")
if not parser:
parser = XMLParser(target=TreeBuilder())
return _IterParseIterator(source, events, parser)
class _IterParseIterator:
def __init__(self, source, events, parser):
self._file = source
self._events = []
self._index = 0
self.root = self._root = None
self._parser = XMLTreeBuilder()
self._parser = parser
# wire up the parser for event reporting
parser = self._parser._parser
append = self._events.append
......@@ -891,16 +1257,14 @@ class iterparse:
parser.EndElementHandler = handler
elif event == "start-ns":
def handler(prefix, uri, event=event, append=append):
try:
uri = _encode(uri, "ascii")
except UnicodeError:
pass
append((event, (prefix or "", uri)))
append((event, (prefix or "", uri or "")))
parser.StartNamespaceDeclHandler = handler
elif event == "end-ns":
def handler(prefix, event=event, append=append):
append((event, None))
parser.EndNamespaceDeclHandler = handler
else:
raise ValueError("unknown event %r" % event)
def __next__(self):
while 1:
......@@ -909,10 +1273,7 @@ class iterparse:
except IndexError:
if self._parser is None:
self.root = self._root
try:
raise StopIteration
except NameError:
raise IndexError
# load event buffer
del self._events[:]
self._index = 0
......@@ -926,24 +1287,22 @@ class iterparse:
self._index = self._index + 1
return item
try:
iter
def __iter__(self):
return self
except NameError:
def __getitem__(self, index):
return self.__next__()
##
# Parses an XML document from a string constant. This function can
# be used to embed "XML literals" in Python code.
#
# @param source A string containing XML data.
# @param parser An optional parser instance. If not given, the
# standard {@link XMLParser} parser is used.
# @return An Element instance.
# @defreturn Element
def XML(text):
parser = XMLTreeBuilder()
def XML(text, parser=None):
if not parser:
parser = XMLParser(target=TreeBuilder())
parser.feed(text)
return parser.close()
......@@ -952,15 +1311,18 @@ def XML(text):
# a dictionary which maps from element id:s to elements.
#
# @param source A string containing XML data.
# @param parser An optional parser instance. If not given, the
# standard {@link XMLParser} parser is used.
# @return A tuple containing an Element instance and a dictionary.
# @defreturn (Element, dictionary)
def XMLID(text):
parser = XMLTreeBuilder()
def XMLID(text, parser=None):
if not parser:
parser = XMLParser(target=TreeBuilder())
parser.feed(text)
tree = parser.close()
ids = {}
for elem in tree.getiterator():
for elem in tree.iter():
id = elem.get("id")
if id:
ids[id] = elem
......@@ -977,25 +1339,23 @@ def XMLID(text):
fromstring = XML
##
# Generates a string representation of an XML element, including all
# subelements. If encoding is None, the return type is a string;
# otherwise it is a bytes array.
# Parses an XML document from a sequence of string fragments.
#
# @param element An Element instance.
# @return An (optionally) encoded string containing the XML data.
# @defreturn string
# @param sequence A list or other sequence containing XML data fragments.
# @param parser An optional parser instance. If not given, the
# standard {@link XMLParser} parser is used.
# @return An Element instance.
# @defreturn Element
# @since 1.3
def tostring(element, encoding=None):
class dummy:
pass
data = []
file = dummy()
file.write = data.append
ElementTree(element).write(file, encoding)
if encoding:
return b"".join(data)
else:
return "".join(data)
def fromstringlist(sequence, parser=None):
if not parser:
parser = XMLParser(target=TreeBuilder())
for text in sequence:
parser.feed(text)
return parser.close()
# --------------------------------------------------------------------
##
# Generic element structure builder. This builder converts a sequence
......@@ -1016,11 +1376,11 @@ class TreeBuilder:
self._last = None # last element
self._tail = None # true if we're after an end tag
if element_factory is None:
element_factory = _ElementInterface
element_factory = Element
self._factory = element_factory
##
# Flushes the parser buffers, and returns the toplevel documen
# Flushes the builder buffers, and returns the toplevel document
# element.
#
# @return An Element instance.
......@@ -1028,7 +1388,7 @@ class TreeBuilder:
def close(self):
assert len(self._elem) == 0, "missing end tags"
assert self._last != None, "missing toplevel element"
assert self._last is not None, "missing toplevel element"
return self._last
def _flush(self):
......@@ -1093,28 +1453,39 @@ class TreeBuilder:
# instance of the standard {@link #TreeBuilder} class.
# @keyparam html Predefine HTML entities. This flag is not supported
# by the current implementation.
# @keyparam encoding Optional encoding. If given, the value overrides
# the encoding specified in the XML file.
# @see #ElementTree
# @see #TreeBuilder
class XMLTreeBuilder:
class XMLParser:
def __init__(self, html=0, target=None):
def __init__(self, html=0, target=None, encoding=None):
try:
from xml.parsers import expat
except ImportError:
try:
import pyexpat as expat
except ImportError:
raise ImportError(
"No module named expat; use SimpleXMLTreeBuilder instead"
)
self._parser = parser = expat.ParserCreate(None, "}")
parser = expat.ParserCreate(encoding, "}")
if target is None:
target = TreeBuilder()
self._target = target
# underscored names are provided for compatibility only
self.parser = self._parser = parser
self.target = self._target = target
self._error = expat.error
self._names = {} # name memo cache
# callbacks
parser.DefaultHandlerExpand = self._default
parser.StartElementHandler = self._start
parser.EndElementHandler = self._end
parser.CharacterDataHandler = self._data
# optional callbacks
parser.CommentHandler = self._comment
parser.ProcessingInstructionHandler = self._pi
# let expat do the buffering, if supported
try:
self._parser.buffer_text = 1
......@@ -1127,10 +1498,18 @@ class XMLTreeBuilder:
parser.StartElementHandler = self._start_list
except AttributeError:
pass
encoding = "utf-8"
# target.xml(encoding, None)
self._doctype = None
self.entity = {}
try:
self.version = "Expat %d.%d.%d" % expat.version_info
except AttributeError:
pass # unknown
def _raiseerror(self, value):
err = ParseError(value)
err.code = value.code
err.position = value.lineno, value.offset
raise err
def _fixname(self, key):
# expand qname, and convert name string to ascii, if possible
......@@ -1149,7 +1528,7 @@ class XMLTreeBuilder:
attrib = {}
for key, value in attrib_in.items():
attrib[fixname(key)] = value
return self._target.start(tag, attrib)
return self.target.start(tag, attrib)
def _start_list(self, tag, attrib_in):
fixname = self._fixname
......@@ -1158,27 +1537,47 @@ class XMLTreeBuilder:
if attrib_in:
for i in range(0, len(attrib_in), 2):
attrib[fixname(attrib_in[i])] = attrib_in[i+1]
return self._target.start(tag, attrib)
return self.target.start(tag, attrib)
def _data(self, text):
return self._target.data(text)
return self.target.data(text)
def _end(self, tag):
return self._target.end(self._fixname(tag))
return self.target.end(self._fixname(tag))
def _comment(self, data):
try:
comment = self.target.comment
except AttributeError:
pass
else:
return comment(data)
def _pi(self, target, data):
try:
pi = self.target.pi
except AttributeError:
pass
else:
return pi(target, data)
def _default(self, text):
prefix = text[:1]
if prefix == "&":
# deal with undefined entities
try:
self._target.data(self.entity[text[1:-1]])
self.target.data(self.entity[text[1:-1]])
except KeyError:
from xml.parsers import expat
raise expat.error(
err = expat.error(
"undefined entity %s: line %d, column %d" %
(text, self._parser.ErrorLineNumber,
self._parser.ErrorColumnNumber)
)
err.code = 11 # XML_ERROR_UNDEFINED_ENTITY
err.lineno = self._parser.ErrorLineNumber
err.offset = self._parser.ErrorColumnNumber
raise err
elif prefix == "<" and text[:9] == "<!DOCTYPE":
self._doctype = [] # inside a doctype declaration
elif self._doctype is not None:
......@@ -1202,18 +1601,31 @@ class XMLTreeBuilder:
return
if pubid:
pubid = pubid[1:-1]
if hasattr(self.target, "doctype"):
self.target.doctype(name, pubid, system[1:-1])
elif self.doctype is not self._XMLParser__doctype:
# warn about deprecated call
self._XMLParser__doctype(name, pubid, system[1:-1])
self.doctype(name, pubid, system[1:-1])
self._doctype = None
##
# Handles a doctype declaration.
# (Deprecated) Handles a doctype declaration.
#
# @param name Doctype name.
# @param pubid Public identifier.
# @param system System identifier.
def doctype(self, name, pubid, system):
pass
"""This method of XMLParser is deprecated."""
warnings.warn(
"This method of XMLParser is deprecated. Define doctype() "
"method on the TreeBuilder target.",
DeprecationWarning,
)
# sentinel, if doctype is redefined in a subclass
__doctype = doctype
##
# Feeds data to the parser.
......@@ -1221,7 +1633,10 @@ class XMLTreeBuilder:
# @param data Encoded data.
def feed(self, data):
try:
self._parser.Parse(data, 0)
except self._error as v:
self._raiseerror(v)
##
# Finishes feeding data to the parser.
......@@ -1230,10 +1645,20 @@ class XMLTreeBuilder:
# @defreturn Element
def close(self):
try:
self._parser.Parse("", 1) # end of data
tree = self._target.close()
del self._target, self._parser # get rid of circular references
except self._error as v:
self._raiseerror(v)
tree = self.target.close()
del self.target, self._parser # get rid of circular references
return tree
# compatibility
XMLParser = XMLTreeBuilder
XMLTreeBuilder = XMLParser
# workaround circular import.
try:
from ElementC14N import _serialize_c14n
_serialize["c14n"] = _serialize_c14n
except ImportError:
pass
# $Id: __init__.py 1821 2004-06-03 16:57:49Z fredrik $
# $Id: __init__.py 3375 2008-02-13 08:05:08Z fredrik $
# elementtree package
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2004 by Fredrik Lundh
# Copyright (c) 1999-2008 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
......@@ -30,4 +30,4 @@
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/2.4/license for licensing details.
# See http://www.python.org/psf/license for licensing details.
......@@ -836,7 +836,7 @@ EXTRAPLATDIR= @EXTRAPLATDIR@
MACHDEPS= $(PLATDIR) $(EXTRAPLATDIR)
XMLLIBSUBDIRS= xml xml/dom xml/etree xml/parsers xml/sax
LIBSUBDIRS= tkinter site-packages test test/output test/data \
test/decimaltestdata \
test/decimaltestdata test/xmltestdata \
encodings \
email email/mime email/test email/test/data \
html json json/tests http dbm xmlrpc \
......
......@@ -283,6 +283,9 @@ C-API
Library
-------
- Issue #6472: The xml.etree package is updated to ElementTree 1.3. The
cElementTree module is updated too.
- Issue #7774: Set sys.executable to an empty string if argv[0] has been set to
an non existent program name and Python is unable to retrieve the real
program name
......
/*
* ElementTree
* $Id: _elementtree.c 2657 2006-03-12 20:50:32Z fredrik $
* $Id: _elementtree.c 3473 2009-01-11 22:53:55Z fredrik $
*
* elementtree accelerator
*
* History:
* 1999-06-20 fl created (as part of sgmlop)
* 2001-05-29 fl effdom edition
* 2001-06-05 fl backported to unix; fixed bogus free in clear
* 2001-07-10 fl added findall helper
* 2003-02-27 fl elementtree edition (alpha)
* 2004-06-03 fl updates for elementtree 1.2
* 2005-01-05 fl added universal name cache, Element/SubElement factories
* 2005-01-06 fl moved python helpers into C module; removed 1.5.2 support
* 2005-01-07 fl added 2.1 support; work around broken __copy__ in 2.3
* 2005-01-08 fl added makeelement method; fixed path support
* 2005-01-10 fl optimized memory usage
* 2005-01-05 fl major optimization effort
* 2005-01-11 fl first public release (cElementTree 0.8)
* 2005-01-12 fl split element object into base and extras
* 2005-01-13 fl use tagged pointers for tail/text (cElementTree 0.9)
......@@ -35,16 +29,23 @@
* 2005-12-16 fl added support for non-standard encodings
* 2006-03-08 fl fixed a couple of potential null-refs and leaks
* 2006-03-12 fl merge in 2.5 ssize_t changes
* 2007-08-25 fl call custom builder's close method from XMLParser
* 2007-08-31 fl added iter, extend from ET 1.3
* 2007-09-01 fl fixed ParseError exception, setslice source type, etc
* 2007-09-03 fl fixed handling of negative insert indexes
* 2007-09-04 fl added itertext from ET 1.3
* 2007-09-06 fl added position attribute to ParseError exception
* 2008-06-06 fl delay error reporting in iterparse (from Hrvoje Niksic)
*
* Copyright (c) 1999-2006 by Secret Labs AB. All rights reserved.
* Copyright (c) 1999-2006 by Fredrik Lundh.
* Copyright (c) 1999-2009 by Secret Labs AB. All rights reserved.
* Copyright (c) 1999-2009 by Fredrik Lundh.
*
* info@pythonware.com
* http://www.pythonware.com
*/
/* Licensed to PSF under a Contributor Agreement. */
/* See http://www.python.org/2.4/license for licensing details. */
/* See http://www.python.org/psf/license for licensing details. */
#include "Python.h"
......@@ -56,7 +57,7 @@
/* Leave defined to include the expat-based XMLParser type */
#define USE_EXPAT
/* Define to to all expat calls via pyexpat's embedded expat library */
/* Define to do all expat calls via pyexpat's embedded expat library */
/* #define USE_PYEXPAT_CAPI */
/* An element can hold this many children without extra memory
......@@ -93,6 +94,25 @@ do { memory -= size; printf("%8d - %s\n", memory, comment); } while (0)
#define LOCAL(type) static type
#endif
/* compatibility macros */
#if (PY_VERSION_HEX < 0x02060000)
#define Py_REFCNT(ob) (((PyObject*)(ob))->ob_refcnt)
#define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)
#endif
#if (PY_VERSION_HEX < 0x02050000)
typedef int Py_ssize_t;
#define lenfunc inquiry
#endif
#if (PY_VERSION_HEX < 0x02040000)
#define PyDict_CheckExact PyDict_Check
#if !defined(Py_RETURN_NONE)
#define Py_RETURN_NONE return Py_INCREF(Py_None), Py_None
#endif
#endif
/* macros used to store 'join' flags in string object pointers. note
that all use of text and tail as object pointers must be wrapped in
JOIN_OBJ. see comments in the ElementObject definition for more
......@@ -102,9 +122,11 @@ do { memory -= size; printf("%8d - %s\n", memory, comment); } while (0)
#define JOIN_OBJ(p) ((PyObject*) ((Py_uintptr_t) (p) & ~1))
/* glue functions (see the init function for details) */
static PyObject* elementtree_parseerror_obj;
static PyObject* elementtree_copyelement_obj;
static PyObject* elementtree_deepcopy_obj;
static PyObject* elementtree_getiterator_obj;
static PyObject* elementtree_iter_obj;
static PyObject* elementtree_itertext_obj;
static PyObject* elementpath_obj;
/* helpers */
......@@ -188,23 +210,6 @@ list_join(PyObject* list)
return result;
}
#if (PY_VERSION_HEX < 0x02020000)
LOCAL(int)
PyDict_Update(PyObject* dict, PyObject* other)
{
/* PyDict_Update emulation for 2.1 and earlier */
PyObject* res;
res = PyObject_CallMethod(dict, "update", "O", other);
if (!res)
return -1;
Py_DECREF(res);
return 0;
}
#endif
/* -------------------------------------------------------------------- */
/* the element type */
......@@ -407,6 +412,7 @@ element_get_attrib(ElementObject* self)
PyObject* res = self->extra->attrib;
if (res == Py_None) {
Py_DECREF(res);
/* create missing dictionary */
res = PyDict_New();
if (!res)
......@@ -688,6 +694,8 @@ element_deepcopy(ElementObject* self, PyObject* args)
/* add object to memo dictionary (so deepcopy won't visit it again) */
id = PyLong_FromLong((Py_uintptr_t) self);
if (!id)
goto error;
i = PyDict_SetItem(memo, id, (PyObject*) element);
......@@ -711,7 +719,8 @@ checkpath(PyObject* tag)
/* check if a tag contains an xpath character */
#define PATHCHAR(ch) (ch == '/' || ch == '*' || ch == '[' || ch == '@')
#define PATHCHAR(ch) \
(ch == '/' || ch == '*' || ch == '[' || ch == '@' || ch == '.')
if (PyUnicode_Check(tag)) {
Py_UNICODE *p = PyUnicode_AS_UNICODE(tag);
......@@ -741,18 +750,52 @@ checkpath(PyObject* tag)
return 1; /* unknown type; might be path expression */
}
static PyObject*
element_extend(ElementObject* self, PyObject* args)
{
PyObject* seq;
Py_ssize_t i, seqlen = 0;
PyObject* seq_in;
if (!PyArg_ParseTuple(args, "O:extend", &seq_in))
return NULL;
seq = PySequence_Fast(seq_in, "");
if (!seq) {
PyErr_Format(
PyExc_TypeError,
"expected sequence, not \"%.200s\"", Py_TYPE(seq_in)->tp_name
);
return NULL;
}
seqlen = PySequence_Size(seq);
for (i = 0; i < seqlen; i++) {
PyObject* element = PySequence_Fast_GET_ITEM(seq, i);
if (element_add_subelement(self, element) < 0) {
Py_DECREF(seq);
return NULL;
}
}
Py_DECREF(seq);
Py_RETURN_NONE;
}
static PyObject*
element_find(ElementObject* self, PyObject* args)
{
int i;
PyObject* tag;
if (!PyArg_ParseTuple(args, "O:find", &tag))
PyObject* namespaces = Py_None;
if (!PyArg_ParseTuple(args, "O|O:find", &tag, &namespaces))
return NULL;
if (checkpath(tag))
if (checkpath(tag) || namespaces != Py_None)
return PyObject_CallMethod(
elementpath_obj, "find", "OO", self, tag
elementpath_obj, "find", "OOO", self, tag, namespaces
);
if (!self->extra)
......@@ -777,12 +820,13 @@ element_findtext(ElementObject* self, PyObject* args)
PyObject* tag;
PyObject* default_value = Py_None;
if (!PyArg_ParseTuple(args, "O|O:findtext", &tag, &default_value))
PyObject* namespaces = Py_None;
if (!PyArg_ParseTuple(args, "O|OO:findtext", &tag, &default_value, &namespaces))
return NULL;
if (checkpath(tag))
if (checkpath(tag) || namespaces != Py_None)
return PyObject_CallMethod(
elementpath_obj, "findtext", "OOO", self, tag, default_value
elementpath_obj, "findtext", "OOOO", self, tag, default_value, namespaces
);
if (!self->extra) {
......@@ -813,12 +857,13 @@ element_findall(ElementObject* self, PyObject* args)
PyObject* out;
PyObject* tag;
if (!PyArg_ParseTuple(args, "O:findall", &tag))
PyObject* namespaces = Py_None;
if (!PyArg_ParseTuple(args, "O|O:findall", &tag, &namespaces))
return NULL;
if (checkpath(tag))
if (checkpath(tag) || namespaces != Py_None)
return PyObject_CallMethod(
elementpath_obj, "findall", "OO", self, tag
elementpath_obj, "findall", "OOO", self, tag, namespaces
);
out = PyList_New(0);
......@@ -842,6 +887,19 @@ element_findall(ElementObject* self, PyObject* args)
return out;
}
static PyObject*
element_iterfind(ElementObject* self, PyObject* args)
{
PyObject* tag;
PyObject* namespaces = Py_None;
if (!PyArg_ParseTuple(args, "O|O:iterfind", &tag, &namespaces))
return NULL;
return PyObject_CallMethod(
elementpath_obj, "iterfind", "OOO", self, tag, namespaces
);
}
static PyObject*
element_get(ElementObject* self, PyObject* args)
{
......@@ -870,6 +928,8 @@ element_getchildren(ElementObject* self, PyObject* args)
int i;
PyObject* list;
/* FIXME: report as deprecated? */
if (!PyArg_ParseTuple(args, ":getchildren"))
return NULL;
......@@ -890,18 +950,18 @@ element_getchildren(ElementObject* self, PyObject* args)
}
static PyObject*
element_getiterator(ElementObject* self, PyObject* args)
element_iter(ElementObject* self, PyObject* args)
{
PyObject* result;
PyObject* tag = Py_None;
if (!PyArg_ParseTuple(args, "|O:getiterator", &tag))
if (!PyArg_ParseTuple(args, "|O:iter", &tag))
return NULL;
if (!elementtree_getiterator_obj) {
if (!elementtree_iter_obj) {
PyErr_SetString(
PyExc_RuntimeError,
"getiterator helper not found"
"iter helper not found"
);
return NULL;
}
......@@ -913,61 +973,58 @@ element_getiterator(ElementObject* self, PyObject* args)
Py_INCREF(self); PyTuple_SET_ITEM(args, 0, (PyObject*) self);
Py_INCREF(tag); PyTuple_SET_ITEM(args, 1, (PyObject*) tag);
result = PyObject_CallObject(elementtree_getiterator_obj, args);
result = PyObject_CallObject(elementtree_iter_obj, args);
Py_DECREF(args);
return result;
}
static PyObject*
element_getitem(PyObject* self_, Py_ssize_t index)
element_itertext(ElementObject* self, PyObject* args)
{
ElementObject* self = (ElementObject*) self_;
PyObject* result;
if (!self->extra || index < 0 || index >= self->extra->length) {
if (!PyArg_ParseTuple(args, ":itertext"))
return NULL;
if (!elementtree_itertext_obj) {
PyErr_SetString(
PyExc_IndexError,
"child index out of range"
PyExc_RuntimeError,
"itertext helper not found"
);
return NULL;
}
Py_INCREF(self->extra->children[index]);
return self->extra->children[index];
args = PyTuple_New(1);
if (!args)
return NULL;
Py_INCREF(self); PyTuple_SET_ITEM(args, 0, (PyObject*) self);
result = PyObject_CallObject(elementtree_itertext_obj, args);
Py_DECREF(args);
return result;
}
static PyObject*
element_getslice(PyObject* self_, Py_ssize_t start, Py_ssize_t end)
element_getitem(PyObject* self_, Py_ssize_t index)
{
ElementObject* self = (ElementObject*) self_;
Py_ssize_t i;
PyObject* list;
if (!self->extra)
return PyList_New(0);
/* standard clamping */
if (start < 0)
start = 0;
if (end < 0)
end = 0;
if (end > self->extra->length)
end = self->extra->length;
if (start > end)
start = end;
list = PyList_New(end - start);
if (!list)
if (!self->extra || index < 0 || index >= self->extra->length) {
PyErr_SetString(
PyExc_IndexError,
"child index out of range"
);
return NULL;
for (i = start; i < end; i++) {
PyObject* item = self->extra->children[i];
Py_INCREF(item);
PyList_SET_ITEM(list, i - start, item);
}
return list;
Py_INCREF(self->extra->children[index]);
return self->extra->children[index];
}
static PyObject*
......@@ -984,8 +1041,11 @@ element_insert(ElementObject* self, PyObject* args)
if (!self->extra)
element_new_extra(self, NULL);
if (index < 0) {
index += self->extra->length;
if (index < 0)
index = 0;
}
if (index > self->extra->length)
index = self->extra->length;
......@@ -1156,104 +1216,217 @@ element_set(ElementObject* self, PyObject* args)
}
static int
element_setslice(PyObject* self_, Py_ssize_t start, Py_ssize_t end, PyObject* item)
element_setitem(PyObject* self_, Py_ssize_t index, PyObject* item)
{
ElementObject* self = (ElementObject*) self_;
Py_ssize_t i, new, old;
int i;
PyObject* old;
if (!self->extra || index < 0 || index >= self->extra->length) {
PyErr_SetString(
PyExc_IndexError,
"child assignment index out of range");
return -1;
}
old = self->extra->children[index];
if (item) {
Py_INCREF(item);
self->extra->children[index] = item;
} else {
self->extra->length--;
for (i = index; i < self->extra->length; i++)
self->extra->children[i] = self->extra->children[i+1];
}
Py_DECREF(old);
return 0;
}
static PyObject*
element_subscr(PyObject* self_, PyObject* item)
{
ElementObject* self = (ElementObject*) self_;
#if (PY_VERSION_HEX < 0x02050000)
if (PyInt_Check(item) || PyLong_Check(item)) {
long i = PyInt_AsLong(item);
#else
if (PyIndex_Check(item)) {
Py_ssize_t i = PyNumber_AsSsize_t(item, PyExc_IndexError);
#endif
if (i == -1 && PyErr_Occurred()) {
return NULL;
}
if (i < 0 && self->extra)
i += self->extra->length;
return element_getitem(self_, i);
}
else if (PySlice_Check(item)) {
Py_ssize_t start, stop, step, slicelen, cur, i;
PyObject* list;
if (!self->extra)
return PyList_New(0);
if (PySlice_GetIndicesEx((PySliceObject *)item,
self->extra->length,
&start, &stop, &step, &slicelen) < 0) {
return NULL;
}
if (slicelen <= 0)
return PyList_New(0);
else {
list = PyList_New(slicelen);
if (!list)
return NULL;
for (cur = start, i = 0; i < slicelen;
cur += step, i++) {
PyObject* item = self->extra->children[cur];
Py_INCREF(item);
PyList_SET_ITEM(list, i, item);
}
return list;
}
}
else {
PyErr_SetString(PyExc_TypeError,
"element indices must be integers");
return NULL;
}
}
static int
element_ass_subscr(PyObject* self_, PyObject* item, PyObject* value)
{
ElementObject* self = (ElementObject*) self_;
#if (PY_VERSION_HEX < 0x02050000)
if (PyInt_Check(item) || PyLong_Check(item)) {
long i = PyInt_AsLong(item);
#else
if (PyIndex_Check(item)) {
Py_ssize_t i = PyNumber_AsSsize_t(item, PyExc_IndexError);
#endif
if (i == -1 && PyErr_Occurred()) {
return -1;
}
if (i < 0 && self->extra)
i += self->extra->length;
return element_setitem(self_, i, value);
}
else if (PySlice_Check(item)) {
Py_ssize_t start, stop, step, slicelen, newlen, cur, i;
PyObject* recycle = NULL;
PyObject* seq = NULL;
if (!self->extra)
element_new_extra(self, NULL);
/* standard clamping */
if (start < 0)
start = 0;
if (end < 0)
end = 0;
if (end > self->extra->length)
end = self->extra->length;
if (start > end)
start = end;
old = end - start;
if (item == NULL)
new = 0;
else if (PyList_CheckExact(item)) {
new = PyList_GET_SIZE(item);
} else {
/* FIXME: support arbitrary sequences? */
if (PySlice_GetIndicesEx((PySliceObject *)item,
self->extra->length,
&start, &stop, &step, &slicelen) < 0) {
return -1;
}
if (value == NULL)
newlen = 0;
else {
seq = PySequence_Fast(value, "");
if (!seq) {
PyErr_Format(
PyExc_TypeError,
"expected list, not \"%.200s\"", Py_TYPE(item)->tp_name
"expected sequence, not \"%.200s\"", Py_TYPE(value)->tp_name
);
return -1;
}
newlen = PySequence_Size(seq);
}
if (step != 1 && newlen != slicelen)
{
PyErr_Format(PyExc_ValueError,
#if (PY_VERSION_HEX < 0x02050000)
"attempt to assign sequence of size %d "
"to extended slice of size %d",
#else
"attempt to assign sequence of size %zd "
"to extended slice of size %zd",
#endif
newlen, slicelen
);
return -1;
}
if (old > 0) {
/* Resize before creating the recycle bin, to prevent refleaks. */
if (newlen > slicelen) {
if (element_resize(self, newlen - slicelen) < 0) {
if (seq) {
Py_DECREF(seq);
}
return -1;
}
}
if (slicelen > 0) {
/* to avoid recursive calls to this method (via decref), move
old items to the recycle bin here, and get rid of them when
we're done modifying the element */
recycle = PyList_New(old);
for (i = 0; i < old; i++)
PyList_SET_ITEM(recycle, i, self->extra->children[i + start]);
recycle = PyList_New(slicelen);
if (!recycle) {
if (seq) {
Py_DECREF(seq);
}
return -1;
}
for (cur = start, i = 0; i < slicelen;
cur += step, i++)
PyList_SET_ITEM(recycle, i, self->extra->children[cur]);
}
if (new < old) {
if (newlen < slicelen) {
/* delete slice */
for (i = end; i < self->extra->length; i++)
self->extra->children[i + new - old] = self->extra->children[i];
} else if (new > old) {
for (i = stop; i < self->extra->length; i++)
self->extra->children[i + newlen - slicelen] = self->extra->children[i];
} else if (newlen > slicelen) {
/* insert slice */
if (element_resize(self, new - old) < 0)
return -1;
for (i = self->extra->length-1; i >= end; i--)
self->extra->children[i + new - old] = self->extra->children[i];
for (i = self->extra->length-1; i >= stop; i--)
self->extra->children[i + newlen - slicelen] = self->extra->children[i];
}
/* replace the slice */
for (i = 0; i < new; i++) {
PyObject* element = PyList_GET_ITEM(item, i);
for (cur = start, i = 0; i < newlen;
cur += step, i++) {
PyObject* element = PySequence_Fast_GET_ITEM(seq, i);
Py_INCREF(element);
self->extra->children[i + start] = element;
self->extra->children[cur] = element;
}
self->extra->length += new - old;
self->extra->length += newlen - slicelen;
if (seq) {
Py_DECREF(seq);
}
/* discard the recycle bin, and everything in it */
Py_XDECREF(recycle);
return 0;
}
static int
element_setitem(PyObject* self_, Py_ssize_t index, PyObject* item)
{
ElementObject* self = (ElementObject*) self_;
int i;
PyObject* old;
if (!self->extra || index < 0 || index >= self->extra->length) {
PyErr_SetString(
PyExc_IndexError,
"child assignment index out of range");
return -1;
}
old = self->extra->children[index];
if (item) {
Py_INCREF(item);
self->extra->children[index] = item;
} else {
self->extra->length--;
for (i = index; i < self->extra->length; i++)
self->extra->children[i] = self->extra->children[i+1];
else {
PyErr_SetString(PyExc_TypeError,
"element indices must be integers");
return -1;
}
Py_DECREF(old);
return 0;
}
static PyMethodDef element_methods[] = {
......@@ -1268,10 +1441,15 @@ static PyMethodDef element_methods[] = {
{"findall", (PyCFunction) element_findall, METH_VARARGS},
{"append", (PyCFunction) element_append, METH_VARARGS},
{"extend", (PyCFunction) element_extend, METH_VARARGS},
{"insert", (PyCFunction) element_insert, METH_VARARGS},
{"remove", (PyCFunction) element_remove, METH_VARARGS},
{"getiterator", (PyCFunction) element_getiterator, METH_VARARGS},
{"iter", (PyCFunction) element_iter, METH_VARARGS},
{"itertext", (PyCFunction) element_itertext, METH_VARARGS},
{"iterfind", (PyCFunction) element_iterfind, METH_VARARGS},
{"getiterator", (PyCFunction) element_iter, METH_VARARGS},
{"getchildren", (PyCFunction) element_getchildren, METH_VARARGS},
{"items", (PyCFunction) element_items, METH_VARARGS},
......@@ -1306,21 +1484,37 @@ element_getattro(ElementObject* self, PyObject* nameobj)
if (PyUnicode_Check(nameobj))
name = _PyUnicode_AsString(nameobj);
if (strcmp(name, "tag") == 0)
/* handle common attributes first */
if (strcmp(name, "tag") == 0) {
res = self->tag;
else if (strcmp(name, "text") == 0)
Py_INCREF(res);
return res;
} else if (strcmp(name, "text") == 0) {
res = element_get_text(self);
else if (strcmp(name, "tail") == 0) {
Py_INCREF(res);
return res;
}
/* methods */
res = PyObject_GenericGetAttr((PyObject*) self, nameobj);
if (res)
return res;
/* less common attributes */
if (strcmp(name, "tail") == 0) {
PyErr_Clear();
res = element_get_tail(self);
} else if (strcmp(name, "attrib") == 0) {
PyErr_Clear();
if (!self->extra)
element_new_extra(self, NULL);
res = element_get_attrib(self);
} else {
return PyObject_GenericGetAttr((PyObject*) self, nameobj);
}
Py_XINCREF(res);
if (!res)
return NULL;
Py_INCREF(res);
return res;
}
......@@ -1366,9 +1560,15 @@ static PySequenceMethods element_as_sequence = {
0, /* sq_concat */
0, /* sq_repeat */
element_getitem,
element_getslice,
0,
element_setitem,
element_setslice,
0,
};
static PyMappingMethods element_as_mapping = {
(lenfunc) element_length,
(binaryfunc) element_subscr,
(objobjargproc) element_ass_subscr,
};
static PyTypeObject Element_Type = {
......@@ -1383,7 +1583,7 @@ static PyTypeObject Element_Type = {
(reprfunc)element_repr, /* tp_repr */
0, /* tp_as_number */
&element_as_sequence, /* tp_as_sequence */
0, /* tp_as_mapping */
&element_as_mapping, /* tp_as_mapping */
0, /* tp_hash */
0, /* tp_call */
0, /* tp_str */
......@@ -1537,7 +1737,7 @@ treebuilder_handle_start(TreeBuilderObject* self, PyObject* tag,
} else {
if (self->root) {
PyErr_SetString(
PyExc_SyntaxError,
elementtree_parseerror_obj,
"multiple elements on top level"
);
goto error;
......@@ -1678,7 +1878,7 @@ treebuilder_handle_end(TreeBuilderObject* self, PyObject* tag)
LOCAL(void)
treebuilder_handle_namespace(TreeBuilderObject* self, int start,
const char* prefix, const char *uri)
PyObject *prefix, PyObject *uri)
{
PyObject* res;
PyObject* action;
......@@ -1691,8 +1891,7 @@ treebuilder_handle_namespace(TreeBuilderObject* self, int start,
if (!self->start_ns_event_obj)
return;
action = self->start_ns_event_obj;
/* FIXME: prefix and uri use utf-8 encoding! */
parcel = Py_BuildValue("ss", (prefix) ? prefix : "", uri);
parcel = Py_BuildValue("OO", prefix, uri);
if (!parcel)
return;
Py_INCREF(action);
......@@ -1852,6 +2051,7 @@ typedef struct {
PyObject* names;
PyObject* handle_xml;
PyObject* handle_start;
PyObject* handle_data;
PyObject* handle_end;
......@@ -1859,6 +2059,8 @@ typedef struct {
PyObject* handle_comment;
PyObject* handle_pi;
PyObject* handle_close;
} XMLParserObject;
static PyTypeObject XMLParser_Type;
......@@ -1930,6 +2132,36 @@ makeuniversal(XMLParserObject* self, const char* string)
return value;
}
static void
expat_set_error(const char* message, int line, int column)
{
PyObject *error;
PyObject *position;
char buffer[256];
sprintf(buffer, "%s: line %d, column %d", message, line, column);
error = PyObject_CallFunction(elementtree_parseerror_obj, "s", buffer);
if (!error)
return;
/* add position attribute */
position = Py_BuildValue("(ii)", line, column);
if (!position) {
Py_DECREF(error);
return;
}
if (PyObject_SetAttrString(error, "position", position) == -1) {
Py_DECREF(error);
Py_DECREF(position);
return;
}
Py_DECREF(position);
PyErr_SetObject(elementtree_parseerror_obj, error);
Py_DECREF(error);
}
/* -------------------------------------------------------------------- */
/* handlers */
......@@ -1960,10 +2192,12 @@ expat_default_handler(XMLParserObject* self, const XML_Char* data_in,
else
res = NULL;
Py_XDECREF(res);
} else {
PyErr_Format(
PyExc_SyntaxError, "undefined entity &%s;: line %ld, column %ld",
PyBytes_AS_STRING(key),
} else if (!PyErr_Occurred()) {
/* Report the first error, not the last */
char message[128];
sprintf(message, "undefined entity &%.100s;", _PyUnicode_AsString(key));
expat_set_error(
message,
EXPAT(GetErrorLineNumber)(self->parser),
EXPAT(GetErrorColumnNumber)(self->parser)
);
......@@ -2018,9 +2252,15 @@ expat_start_handler(XMLParserObject* self, const XML_Char* tag_in,
/* shortcut */
res = treebuilder_handle_start((TreeBuilderObject*) self->target,
tag, attrib);
else if (self->handle_start)
else if (self->handle_start) {
if (attrib == Py_None) {
Py_DECREF(attrib);
attrib = PyDict_New();
if (!attrib)
return;
}
res = PyObject_CallFunction(self->handle_start, "OO", tag, attrib);
else
} else
res = NULL;
Py_DECREF(tag);
......@@ -2080,9 +2320,28 @@ static void
expat_start_ns_handler(XMLParserObject* self, const XML_Char* prefix,
const XML_Char *uri)
{
PyObject* sprefix = NULL;
PyObject* suri = NULL;
suri = PyUnicode_DecodeUTF8(uri, strlen(uri), "strict");
if (!suri)
return;
if (prefix)
sprefix = PyUnicode_DecodeUTF8(prefix, strlen(prefix), "strict");
else
sprefix = PyUnicode_FromString("");
if (!sprefix) {
Py_DECREF(suri);
return;
}
treebuilder_handle_namespace(
(TreeBuilderObject*) self->target, 1, prefix, uri
(TreeBuilderObject*) self->target, 1, sprefix, suri
);
Py_DECREF(sprefix);
Py_DECREF(suri);
}
static void
......@@ -2245,6 +2504,7 @@ xmlparser(PyObject* self_, PyObject* args, PyObject* kw)
self->handle_end = PyObject_GetAttrString(target, "end");
self->handle_comment = PyObject_GetAttrString(target, "comment");
self->handle_pi = PyObject_GetAttrString(target, "pi");
self->handle_close = PyObject_GetAttrString(target, "close");
PyErr_Clear();
......@@ -2288,6 +2548,7 @@ xmlparser_dealloc(XMLParserObject* self)
{
EXPAT(ParserFree)(self->parser);
Py_XDECREF(self->handle_close);
Py_XDECREF(self->handle_pi);
Py_XDECREF(self->handle_comment);
Py_XDECREF(self->handle_end);
......@@ -2318,8 +2579,7 @@ expat_parse(XMLParserObject* self, char* data, int data_len, int final)
return NULL;
if (!ok) {
PyErr_Format(
PyExc_SyntaxError, "%s: line %ld, column %ld",
expat_set_error(
EXPAT(ErrorString)(EXPAT(GetErrorCode)(self->parser)),
EXPAT(GetErrorLineNumber)(self->parser),
EXPAT(GetErrorColumnNumber)(self->parser)
......@@ -2340,12 +2600,16 @@ xmlparser_close(XMLParserObject* self, PyObject* args)
return NULL;
res = expat_parse(self, "", 0, 1);
if (!res)
return NULL;
if (res && TreeBuilder_CheckExact(self->target)) {
if (TreeBuilder_CheckExact(self->target)) {
Py_DECREF(res);
return treebuilder_done((TreeBuilderObject*) self->target);
}
} if (self->handle_close) {
Py_DECREF(res);
return PyObject_CallFunction(self->handle_close, "");
} else
return res;
}
......@@ -2458,7 +2722,7 @@ xmlparser_setevents(XMLParserObject* self, PyObject* args)
if (event_set == Py_None) {
/* default is "end" only */
target->end_event_obj = PyBytes_FromString("end");
target->end_event_obj = PyUnicode_FromString("end");
Py_RETURN_NONE;
}
......@@ -2468,9 +2732,13 @@ xmlparser_setevents(XMLParserObject* self, PyObject* args)
for (i = 0; i < PyTuple_GET_SIZE(event_set); i++) {
PyObject* item = PyTuple_GET_ITEM(event_set, i);
char* event;
if (!PyBytes_Check(item))
goto error;
if (PyUnicode_Check(item)) {
event = _PyUnicode_AsString(item);
} else if (PyBytes_Check(item))
event = PyBytes_AS_STRING(item);
else {
goto error;
}
if (strcmp(event, "start") == 0) {
Py_INCREF(item);
target->start_event_obj = item;
......@@ -2542,7 +2810,7 @@ xmlparser_getattro(XMLParserObject* self, PyObject* nameobj)
char buffer[100];
sprintf(buffer, "Expat %d.%d.%d", XML_MAJOR_VERSION,
XML_MINOR_VERSION, XML_MICRO_VERSION);
return PyBytes_FromString(buffer);
return PyUnicode_DecodeUTF8(buffer, strlen(buffer), "strict");
} else {
return PyObject_GenericGetAttr((PyObject*) self, nameobj);
}
......@@ -2617,9 +2885,6 @@ PyInit__elementtree(void)
PyObject* m;
PyObject* g;
char* bootstrap;
#if defined(USE_PYEXPAT_CAPI)
struct PyExpat_CAPI* capi;
#endif
/* Initialize object types */
if (PyType_Ready(&TreeBuilder_Type) < 0)
......@@ -2651,10 +2916,6 @@ PyInit__elementtree(void)
bootstrap = (
#if (PY_VERSION_HEX >= 0x02020000 && PY_VERSION_HEX < 0x02030000)
"from __future__ import generators\n" /* enable yield under 2.2 */
#endif
"from copy import copy, deepcopy\n"
"try:\n"
......@@ -2672,11 +2933,14 @@ PyInit__elementtree(void)
" def copyelement(elem):\n"
" return elem\n"
"def Comment(text=None):\n" /* public */
"class CommentProxy:\n"
" def __call__(self, text=None):\n"
" element = cElementTree.Element(ET.Comment)\n"
" element.text = text\n"
" return element\n"
"cElementTree.Comment = Comment\n"
" def __eq__(self, other):\n"
" return ET.Comment == other\n"
"cElementTree.Comment = CommentProxy()\n"
"class ElementTree(ET.ElementTree):\n" /* public */
" def parse(self, source, parser=None):\n"
......@@ -2695,23 +2959,23 @@ PyInit__elementtree(void)
" return self._root\n"
"cElementTree.ElementTree = ElementTree\n"
"def getiterator(node, tag=None):\n" /* helper */
"def iter(node, tag=None):\n" /* helper */
" if tag == '*':\n"
" tag = None\n"
#if (PY_VERSION_HEX < 0x02020000)
" nodes = []\n" /* 2.1 doesn't have yield */
" if tag is None or node.tag == tag:\n"
" nodes.append(node)\n"
" for node in node:\n"
" nodes.extend(getiterator(node, tag))\n"
" return nodes\n"
#else
" if tag is None or node.tag == tag:\n"
" yield node\n"
" for node in node:\n"
" for node in getiterator(node, tag):\n"
" for node in iter(node, tag):\n"
" yield node\n"
#endif
"def itertext(node):\n" /* helper */
" if node.text:\n"
" yield node.text\n"
" for e in node:\n"
" for s in e.itertext():\n"
" yield s\n"
" if e.tail:\n"
" yield e.tail\n"
"def parse(source, parser=None):\n" /* public */
" tree = ElementTree()\n"
......@@ -2719,48 +2983,52 @@ PyInit__elementtree(void)
" return tree\n"
"cElementTree.parse = parse\n"
#if (PY_VERSION_HEX < 0x02020000)
"if hasattr(ET, 'iterparse'):\n"
" cElementTree.iterparse = ET.iterparse\n" /* delegate on 2.1 */
#else
"class iterparse(object):\n"
"class iterparse:\n"
" root = None\n"
" def __init__(self, file, events=None):\n"
" if not hasattr(file, 'read'):\n"
" file = open(file, 'rb')\n"
" self._file = file\n"
" self._events = events\n"
" def __iter__(self):\n"
" events = []\n"
" self._events = []\n"
" self._index = 0\n"
" self.root = self._root = None\n"
" b = cElementTree.TreeBuilder()\n"
" p = cElementTree.XMLParser(b)\n"
" p._setevents(events, self._events)\n"
" self._parser = cElementTree.XMLParser(b)\n"
" self._parser._setevents(self._events, events)\n"
" def __next__(self):\n"
" while 1:\n"
" try:\n"
" item = self._events[self._index]\n"
" except IndexError:\n"
" if self._parser is None:\n"
" self.root = self._root\n"
" raise StopIteration\n"
" # load event buffer\n"
" del self._events[:]\n"
" self._index = 0\n"
" data = self._file.read(16384)\n"
" if not data:\n"
" break\n"
" p.feed(data)\n"
" for event in events:\n"
" yield event\n"
" del events[:]\n"
" root = p.close()\n"
" for event in events:\n"
" yield event\n"
" self.root = root\n"
" if data:\n"
" self._parser.feed(data)\n"
" else:\n"
" self._root = self._parser.close()\n"
" self._parser = None\n"
" else:\n"
" self._index = self._index + 1\n"
" return item\n"
" def __iter__(self):\n"
" return self\n"
"cElementTree.iterparse = iterparse\n"
#endif
"def PI(target, text=None):\n" /* public */
" element = cElementTree.Element(ET.ProcessingInstruction)\n"
"class PIProxy:\n"
" def __call__(self, target, text=None):\n"
" element = cElementTree.Element(ET.PI)\n"
" element.text = target\n"
" if text:\n"
" element.text = element.text + ' ' + text\n"
" return element\n"
" elem = cElementTree.Element(ET.PI)\n"
" elem.text = text\n"
" return elem\n"
"cElementTree.PI = cElementTree.ProcessingInstruction = PI\n"
" def __eq__(self, other):\n"
" return ET.PI == other\n"
"cElementTree.PI = cElementTree.ProcessingInstruction = PIProxy()\n"
"def XML(text):\n" /* public */
" parser = cElementTree.XMLParser()\n"
......@@ -2771,25 +3039,34 @@ PyInit__elementtree(void)
"def XMLID(text):\n" /* public */
" tree = XML(text)\n"
" ids = {}\n"
" for elem in tree.getiterator():\n"
" for elem in tree.iter():\n"
" id = elem.get('id')\n"
" if id:\n"
" ids[id] = elem\n"
" return tree, ids\n"
"cElementTree.XMLID = XMLID\n"
"try:\n"
" register_namespace = ET.register_namespace\n"
"except AttributeError:\n"
" def register_namespace(prefix, uri):\n"
" ET._namespace_map[uri] = prefix\n"
"cElementTree.register_namespace = register_namespace\n"
"cElementTree.dump = ET.dump\n"
"cElementTree.ElementPath = ElementPath = ET.ElementPath\n"
"cElementTree.iselement = ET.iselement\n"
"cElementTree.QName = ET.QName\n"
"cElementTree.tostring = ET.tostring\n"
"cElementTree.fromstringlist = ET.fromstringlist\n"
"cElementTree.tostringlist = ET.tostringlist\n"
"cElementTree.VERSION = '" VERSION "'\n"
"cElementTree.__version__ = '" VERSION "'\n"
"cElementTree.XMLParserError = SyntaxError\n"
);
PyRun_String(bootstrap, Py_file_input, g, NULL);
if (!PyRun_String(bootstrap, Py_file_input, g, NULL))
return NULL;
elementpath_obj = PyDict_GetItemString(g, "ElementPath");
......@@ -2804,22 +3081,30 @@ PyInit__elementtree(void)
}
} else
PyErr_Clear();
elementtree_deepcopy_obj = PyDict_GetItemString(g, "deepcopy");
elementtree_getiterator_obj = PyDict_GetItemString(g, "getiterator");
elementtree_iter_obj = PyDict_GetItemString(g, "iter");
elementtree_itertext_obj = PyDict_GetItemString(g, "itertext");
#if defined(USE_PYEXPAT_CAPI)
/* link against pyexpat, if possible */
capi = PyCapsule_Import(PyExpat_CAPSULE_NAME, 0);
if (capi &&
strcmp(capi->magic, PyExpat_CAPI_MAGIC) == 0 &&
capi->size <= sizeof(*expat_capi) &&
capi->MAJOR_VERSION == XML_MAJOR_VERSION &&
capi->MINOR_VERSION == XML_MINOR_VERSION &&
capi->MICRO_VERSION == XML_MICRO_VERSION)
expat_capi = capi;
else
expat_capi = PyCapsule_Import(PyExpat_CAPSULE_NAME, 0);
if (expat_capi) {
/* check that it's usable */
if (strcmp(expat_capi->magic, PyExpat_CAPI_MAGIC) != 0 ||
expat_capi->size < sizeof(struct PyExpat_CAPI) ||
expat_capi->MAJOR_VERSION != XML_MAJOR_VERSION ||
expat_capi->MINOR_VERSION != XML_MINOR_VERSION ||
expat_capi->MICRO_VERSION != XML_MICRO_VERSION)
expat_capi = NULL;
}
#endif
return m;
elementtree_parseerror_obj = PyErr_NewException(
"cElementTree.ParseError", PyExc_SyntaxError, NULL
);
Py_INCREF(elementtree_parseerror_obj);
PyModule_AddObject(m, "ParseError", elementtree_parseerror_obj);
return m;
}
......@@ -1006,8 +1006,6 @@ def add_files(db):
lib.add_file("audiotest.au")
lib.add_file("cfgparser.1")
lib.add_file("sgml_input.html")
lib.add_file("test.xml")
lib.add_file("test.xml.out")
lib.add_file("testtar.tar")
lib.add_file("test_difflib_expect.html")
lib.add_file("check_soundcard.vbs")
......@@ -1019,6 +1017,9 @@ def add_files(db):
lib.add_file("zipdir.zip")
if dir=='decimaltestdata':
lib.glob("*.decTest")
if dir=='xmltestdata':
lib.glob("*.xml")
lib.add_file("test.xml.out")
if dir=='output':
lib.glob("test_*")
if dir=='idlelib':
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment