xml.etree.elementtree.rst 42.8 KB
Newer Older
1 2 3 4 5 6 7
:mod:`xml.etree.ElementTree` --- The ElementTree XML API
========================================================

.. module:: xml.etree.ElementTree
   :synopsis: Implementation of the ElementTree API.
.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>

8 9
The :mod:`xml.etree.ElementTree` module implements a simple and efficient API
for parsing and creating XML data.
10

11 12 13 14
.. versionchanged:: 3.3
   This module will use a fast implementation whenever available.
   The :mod:`xml.etree.cElementTree` module is deprecated.

15 16 17 18 19 20 21

.. warning::

   The :mod:`xml.etree.ElementTree` module is not secure against
   maliciously constructed data.  If you need to parse untrusted or
   unauthenticated data see :ref:`xml-vulnerabilities`.

22 23
Tutorial
--------
24

25 26 27
This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
short).  The goal is to demonstrate some of the building blocks and basic
concepts of the module.
28

29 30
XML tree and elements
^^^^^^^^^^^^^^^^^^^^^
31

32 33 34 35 36 37 38
XML is an inherently hierarchical data format, and the most natural way to
represent it is with a tree.  ``ET`` has two classes for this purpose -
:class:`ElementTree` represents the whole XML document as a tree, and
:class:`Element` represents a single node in this tree.  Interactions with
the whole document (reading and writing to/from files) are usually done
on the :class:`ElementTree` level.  Interactions with a single XML element
and its sub-elements are done on the :class:`Element` level.
39

40
.. _elementtree-parsing-xml:
41

42 43
Parsing XML
^^^^^^^^^^^
44

45
We'll be using the following XML document as the sample data for this section:
46

47 48 49
.. code-block:: xml

   <?xml version="1.0"?>
50
   <data>
51
       <country name="Liechtenstein">
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
           <rank>1</rank>
           <year>2008</year>
           <gdppc>141100</gdppc>
           <neighbor name="Austria" direction="E"/>
           <neighbor name="Switzerland" direction="W"/>
       </country>
       <country name="Singapore">
           <rank>4</rank>
           <year>2011</year>
           <gdppc>59900</gdppc>
           <neighbor name="Malaysia" direction="N"/>
       </country>
       <country name="Panama">
           <rank>68</rank>
           <year>2011</year>
           <gdppc>13600</gdppc>
           <neighbor name="Costa Rica" direction="W"/>
           <neighbor name="Colombia" direction="E"/>
       </country>
   </data>

73
We can import this data by reading from a file::
74 75

   import xml.etree.ElementTree as ET
76 77 78 79
   tree = ET.parse('country_data.xml')
   root = tree.getroot()

Or directly from a string::
80

81
   root = ET.fromstring(country_data_as_string)
82 83 84

:func:`fromstring` parses XML from a string directly into an :class:`Element`,
which is the root element of the parsed tree.  Other parsing functions may
85
create an :class:`ElementTree`.  Check the documentation to be sure.
86 87 88 89 90 91 92 93 94 95 96 97 98

As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::

   >>> root.tag
   'data'
   >>> root.attrib
   {}

It also has children nodes over which we can iterate::

   >>> for child in root:
   ...   print(child.tag, child.attrib)
   ...
99
   country {'name': 'Liechtenstein'}
100 101 102 103 104 105 106 107
   country {'name': 'Singapore'}
   country {'name': 'Panama'}

Children are nested, and we can access specific child nodes by index::

   >>> root[0][1].text
   '2008'

108

109 110 111 112 113 114 115 116 117 118 119 120 121
.. note::

   Not all elements of the XML input will end up as elements of the
   parsed tree. Currently, this module skips over any XML comments,
   processing instructions, and document type declarations in the
   input. Nevertheless, trees built using this module's API rather
   than parsing from XML text can have comments and processing
   instructions in them; they will be included when generating XML
   output. A document type declaration may be accessed by passing a
   custom :class:`TreeBuilder` instance to the :class:`XMLParser`
   constructor.


122 123
.. _elementtree-pull-parsing:

124
Pull API for non-blocking parsing
125 126
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

127 128 129
Most parsing functions provided by this module require the whole document
to be read at once before returning any result.  It is possible to use an
:class:`XMLParser` and feed data into it incrementally, but it is a push API that
130 131 132 133 134 135 136 137
calls methods on a callback target, which is too low-level and inconvenient for
most needs.  Sometimes what the user really wants is to be able to parse XML
incrementally, without blocking operations, while enjoying the convenience of
fully constructed :class:`Element` objects.

The most powerful tool for doing this is :class:`XMLPullParser`.  It does not
require a blocking read to obtain the XML data, and is instead fed with data
incrementally with :meth:`XMLPullParser.feed` calls.  To get the parsed XML
138
elements, call :meth:`XMLPullParser.read_events`.  Here is an example::
139

140 141 142
   >>> parser = ET.XMLPullParser(['start', 'end'])
   >>> parser.feed('<mytag>sometext')
   >>> list(parser.read_events())
143
   [('start', <Element 'mytag' at 0x7fa66db2be58>)]
144 145
   >>> parser.feed(' more text</mytag>')
   >>> for event, elem in parser.read_events():
146 147 148 149
   ...   print(event)
   ...   print(elem.tag, 'text=', elem.text)
   ...
   end
150

151
The obvious use case is applications that operate in a non-blocking fashion
152 153 154
where the XML data is being received from a socket or read incrementally from
some storage device.  In such cases, blocking reads are unacceptable.

155 156 157 158 159
Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for
simpler use-cases.  If you don't mind your application blocking on reading XML
data but would still like to have incremental parsing capabilities, take a look
at :func:`iterparse`.  It can be useful when you're reading a large XML document
and don't want to hold it wholly in memory.
160

161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176
Finding interesting elements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:class:`Element` has some useful methods that help iterate recursively over all
the sub-tree below it (its children, their children, and so on).  For example,
:meth:`Element.iter`::

   >>> for neighbor in root.iter('neighbor'):
   ...   print(neighbor.attrib)
   ...
   {'name': 'Austria', 'direction': 'E'}
   {'name': 'Switzerland', 'direction': 'W'}
   {'name': 'Malaysia', 'direction': 'N'}
   {'name': 'Costa Rica', 'direction': 'W'}
   {'name': 'Colombia', 'direction': 'E'}

177 178
:meth:`Element.findall` finds only elements with a tag which are direct
children of the current element.  :meth:`Element.find` finds the *first* child
179
with a particular tag, and :attr:`Element.text` accesses the element's text
180 181 182 183 184 185 186
content.  :meth:`Element.get` accesses the element's attributes::

   >>> for country in root.findall('country'):
   ...   rank = country.find('rank').text
   ...   name = country.get('name')
   ...   print(name, rank)
   ...
187
   Liechtenstein 1
188 189 190
   Singapore 4
   Panama 68

191 192 193
More sophisticated specification of which elements to look for is possible by
using :ref:`XPath <elementtree-xpath>`.

194 195
Modifying an XML File
^^^^^^^^^^^^^^^^^^^^^
196

197
:class:`ElementTree` provides a simple way to build XML documents and write them to files.
198 199 200 201 202 203 204
The :meth:`ElementTree.write` method serves this purpose.

Once created, an :class:`Element` object may be manipulated by directly changing
its fields (such as :attr:`Element.text`), adding and modifying attributes
(:meth:`Element.set` method), as well as adding new children (for example
with :meth:`Element.append`).

205 206 207 208 209 210 211 212
Let's say we want to add one to each country's rank, and add an ``updated``
attribute to the rank element::

   >>> for rank in root.iter('rank'):
   ...   new_rank = int(rank.text) + 1
   ...   rank.text = str(new_rank)
   ...   rank.set('updated', 'yes')
   ...
213
   >>> tree.write('output.xml')
214 215 216 217 218 219 220

Our XML now looks like this:

.. code-block:: xml

   <?xml version="1.0"?>
   <data>
221
       <country name="Liechtenstein">
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250
           <rank updated="yes">2</rank>
           <year>2008</year>
           <gdppc>141100</gdppc>
           <neighbor name="Austria" direction="E"/>
           <neighbor name="Switzerland" direction="W"/>
       </country>
       <country name="Singapore">
           <rank updated="yes">5</rank>
           <year>2011</year>
           <gdppc>59900</gdppc>
           <neighbor name="Malaysia" direction="N"/>
       </country>
       <country name="Panama">
           <rank updated="yes">69</rank>
           <year>2011</year>
           <gdppc>13600</gdppc>
           <neighbor name="Costa Rica" direction="W"/>
           <neighbor name="Colombia" direction="E"/>
       </country>
   </data>

We can remove elements using :meth:`Element.remove`.  Let's say we want to
remove all countries with a rank higher than 50::

   >>> for country in root.findall('country'):
   ...   rank = int(country.find('rank').text)
   ...   if rank > 50:
   ...     root.remove(country)
   ...
251
   >>> tree.write('output.xml')
252 253 254 255 256 257 258

Our XML now looks like this:

.. code-block:: xml

   <?xml version="1.0"?>
   <data>
259
       <country name="Liechtenstein">
260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276
           <rank updated="yes">2</rank>
           <year>2008</year>
           <gdppc>141100</gdppc>
           <neighbor name="Austria" direction="E"/>
           <neighbor name="Switzerland" direction="W"/>
       </country>
       <country name="Singapore">
           <rank updated="yes">5</rank>
           <year>2011</year>
           <gdppc>59900</gdppc>
           <neighbor name="Malaysia" direction="N"/>
       </country>
   </data>

Building XML documents
^^^^^^^^^^^^^^^^^^^^^^

277 278 279 280 281 282 283 284 285 286
The :func:`SubElement` function also provides a convenient way to create new
sub-elements for a given element::

   >>> a = ET.Element('a')
   >>> b = ET.SubElement(a, 'b')
   >>> c = ET.SubElement(a, 'c')
   >>> d = ET.SubElement(c, 'd')
   >>> ET.dump(a)
   <a><b /><c><d /></c></a>

287 288 289 290 291 292
Parsing XML with Namespaces
^^^^^^^^^^^^^^^^^^^^^^^^^^^

If the XML input has `namespaces
<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes
with prefixes in the form ``prefix:sometag`` get expanded to
293 294
``{uri}sometag`` where the *prefix* is replaced by the full *URI*.
Also, if there is a `default namespace
295
<https://www.w3.org/TR/2006/REC-xml-names-20060816/#defaulting>`__,
296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319
that full URI gets prepended to all of the non-prefixed tags.

Here is an XML example that incorporates two namespaces, one with the
prefix "fictional" and the other serving as the default namespace:

.. code-block:: xml

    <?xml version="1.0"?>
    <actors xmlns:fictional="http://characters.example.com"
            xmlns="http://people.example.com">
        <actor>
            <name>John Cleese</name>
            <fictional:character>Lancelot</fictional:character>
            <fictional:character>Archie Leach</fictional:character>
        </actor>
        <actor>
            <name>Eric Idle</name>
            <fictional:character>Sir Robin</fictional:character>
            <fictional:character>Gunther</fictional:character>
            <fictional:character>Commander Clement</fictional:character>
        </actor>
    </actors>

One way to search and explore this XML example is to manually add the
320 321
URI to every tag or attribute in the xpath of a
:meth:`~Element.find` or :meth:`~Element.findall`::
322

323
    root = fromstring(xml_text)
324 325 326 327 328 329
    for actor in root.findall('{http://people.example.com}actor'):
        name = actor.find('{http://people.example.com}name')
        print(name.text)
        for char in actor.findall('{http://characters.example.com}character'):
            print(' |-->', char.text)

330 331
A better way to search the namespaced XML example is to create a
dictionary with your own prefixes and use those in the search functions::
332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352

    ns = {'real_person': 'http://people.example.com',
          'role': 'http://characters.example.com'}

    for actor in root.findall('real_person:actor', ns):
        name = actor.find('real_person:name', ns)
        print(name.text)
        for char in actor.findall('role:character', ns):
            print(' |-->', char.text)

These two approaches both output::

    John Cleese
     |--> Lancelot
     |--> Archie Leach
    Eric Idle
     |--> Sir Robin
     |--> Gunther
     |--> Commander Clement


353 354 355 356 357 358 359 360 361 362 363 364 365
Additional resources
^^^^^^^^^^^^^^^^^^^^

See http://effbot.org/zone/element-index.htm for tutorials and links to other
docs.


.. _elementtree-xpath:

XPath support
-------------

This module provides limited support for
366
`XPath expressions <https://www.w3.org/TR/xpath>`_ for locating elements in a
367 368 369 370 371 372 373 374 375 376 377 378 379
tree.  The goal is to support a small subset of the abbreviated syntax; a full
XPath engine is outside the scope of the module.

Example
^^^^^^^

Here's an example that demonstrates some of the XPath capabilities of the
module.  We'll be using the ``countrydata`` XML document from the
:ref:`Parsing XML <elementtree-parsing-xml>` section::

   import xml.etree.ElementTree as ET

   root = ET.fromstring(countrydata)
380 381

   # Top-level elements
382
   root.findall(".")
383 384 385

   # All 'neighbor' grand-children of 'country' children of the top-level
   # elements
386
   root.findall("./country/neighbor")
387 388

   # Nodes with name='Singapore' that have a 'year' child
389
   root.findall(".//year/..[@name='Singapore']")
390 391

   # 'year' nodes that are children of nodes with name='Singapore'
392
   root.findall(".//*[@name='Singapore']/year")
393 394

   # All 'neighbor' nodes that are the second child of their parent
395
   root.findall(".//neighbor[2]")
396 397 398 399

Supported XPath syntax
^^^^^^^^^^^^^^^^^^^^^^

400 401
.. tabularcolumns:: |l|L|

402 403 404 405 406
+-----------------------+------------------------------------------------------+
| Syntax                | Meaning                                              |
+=======================+======================================================+
| ``tag``               | Selects all child elements with the given tag.       |
|                       | For example, ``spam`` selects all child elements     |
407
|                       | named ``spam``, and ``spam/egg`` selects all         |
408 409 410 411 412 413 414 415 416 417 418
|                       | grandchildren named ``egg`` in all children named    |
|                       | ``spam``.                                            |
+-----------------------+------------------------------------------------------+
| ``*``                 | Selects all child elements.  For example, ``*/egg``  |
|                       | selects all grandchildren named ``egg``.             |
+-----------------------+------------------------------------------------------+
| ``.``                 | Selects the current node.  This is mostly useful     |
|                       | at the beginning of the path, to indicate that it's  |
|                       | a relative path.                                     |
+-----------------------+------------------------------------------------------+
| ``//``                | Selects all subelements, on all levels beneath the   |
419
|                       | current  element.  For example, ``.//egg`` selects   |
420 421
|                       | all ``egg`` elements in the entire tree.             |
+-----------------------+------------------------------------------------------+
422 423 424
| ``..``                | Selects the parent element.  Returns ``None`` if the |
|                       | path attempts to reach the ancestors of the start    |
|                       | element (the element ``find`` was called on).        |
425 426 427 428 429 430 431 432 433 434
+-----------------------+------------------------------------------------------+
| ``[@attrib]``         | Selects all elements that have the given attribute.  |
+-----------------------+------------------------------------------------------+
| ``[@attrib='value']`` | Selects all elements for which the given attribute   |
|                       | has the given value.  The value cannot contain       |
|                       | quotes.                                              |
+-----------------------+------------------------------------------------------+
| ``[tag]``             | Selects all elements that have a child named         |
|                       | ``tag``.  Only immediate children are supported.     |
+-----------------------+------------------------------------------------------+
435 436 437
| ``[tag='text']``      | Selects all elements that have a child named         |
|                       | ``tag`` whose complete text content, including       |
|                       | descendants, equals the given ``text``.              |
438
+-----------------------+------------------------------------------------------+
439 440 441 442 443 444 445 446 447 448 449 450 451 452
| ``[position]``        | Selects all elements that are located at the given   |
|                       | position.  The position can be either an integer     |
|                       | (1 is the first position), the expression ``last()`` |
|                       | (for the last position), or a position relative to   |
|                       | the last position (e.g. ``last()-1``).               |
+-----------------------+------------------------------------------------------+

Predicates (expressions within square brackets) must be preceded by a tag
name, an asterisk, or another predicate.  ``position`` predicates must be
preceded by a tag name.

Reference
---------

453 454 455
.. _elementtree-functions:

Functions
456
^^^^^^^^^
457 458


459
.. function:: Comment(text=None)
460

461
   Comment element factory.  This factory function creates a special element
462 463 464
   that will be serialized as an XML comment by the standard serializer.  The
   comment string can be either a bytestring or a Unicode string.  *text* is a
   string containing the comment string.  Returns an element instance
465
   representing a comment.
466

467 468 469 470
   Note that :class:`XMLParser` skips over comments in the input
   instead of creating comment objects for them. An :class:`ElementTree` will
   only contain comment nodes if they have been inserted into to
   the tree using one of the :class:`Element` methods.
471 472 473

.. function:: dump(elem)

474 475
   Writes an element tree or element structure to sys.stdout.  This function
   should be used for debugging only.
476 477 478 479 480 481 482

   The exact output format is implementation dependent.  In this version, it's
   written as an ordinary XML file.

   *elem* is an element tree or an individual element.


483
.. function:: fromstring(text)
484

485 486
   Parses an XML section from a string constant.  Same as :func:`XML`.  *text*
   is a string containing XML data.  Returns an :class:`Element` instance.
487 488


489
.. function:: fromstringlist(sequence, parser=None)
490

491 492 493 494
   Parses an XML document from a sequence of string fragments.  *sequence* is a
   list or other sequence containing XML data fragments.  *parser* is an
   optional parser instance.  If not given, the standard :class:`XMLParser`
   parser is used.  Returns an :class:`Element` instance.
495

Ezio Melotti's avatar
Ezio Melotti committed
496
   .. versionadded:: 3.2
497 498 499 500


.. function:: iselement(element)

501 502
   Checks if an object appears to be a valid element object.  *element* is an
   element instance.  Returns a true value if this is an element object.
503 504


505
.. function:: iterparse(source, events=None, parser=None)
506 507

   Parses an XML section into an element tree incrementally, and reports what's
508
   going on to the user.  *source* is a filename or :term:`file object`
509
   containing XML data.  *events* is a sequence of events to report back.  The
510 511
   supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` and
   ``"end-ns"`` (the "ns" events are used to get detailed namespace
512 513
   information).  If *events* is omitted, only ``"end"`` events are reported.
   *parser* is an optional parser instance.  If not given, the standard
514 515 516
   :class:`XMLParser` parser is used.  *parser* must be a subclass of
   :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a
   target.  Returns an :term:`iterator` providing ``(event, elem)`` pairs.
517

518 519
   Note that while :func:`iterparse` builds the tree incrementally, it issues
   blocking reads on *source* (or the file it names).  As such, it's unsuitable
520 521
   for applications where blocking reads can't be made.  For fully non-blocking
   parsing, see :class:`XMLPullParser`.
522

523 524
   .. note::

525 526 527 528 529
      :func:`iterparse` only guarantees that it has seen the ">" character of a
      starting tag when it emits a "start" event, so the attributes are defined,
      but the contents of the text and tail attributes are undefined at that
      point.  The same applies to the element children; they may or may not be
      present.
530 531 532

      If you need a fully populated element, look for "end" events instead.

533 534 535
   .. deprecated:: 3.4
      The *parser* argument.

536
.. function:: parse(source, parser=None)
537

538 539 540 541
   Parses an XML section into an element tree.  *source* is a filename or file
   object containing XML data.  *parser* is an optional parser instance.  If
   not given, the standard :class:`XMLParser` parser is used.  Returns an
   :class:`ElementTree` instance.
542 543


544
.. function:: ProcessingInstruction(target, text=None)
545

546 547 548 549 550
   PI element factory.  This factory function creates a special element that
   will be serialized as an XML processing instruction.  *target* is a string
   containing the PI target.  *text* is a string containing the PI contents, if
   given.  Returns an element instance, representing a processing instruction.

551 552 553 554 555
   Note that :class:`XMLParser` skips over processing instructions
   in the input instead of creating comment objects for them. An
   :class:`ElementTree` will only contain processing instruction nodes if
   they have been inserted into to the tree using one of the
   :class:`Element` methods.
556 557 558 559 560 561 562 563 564

.. function:: register_namespace(prefix, uri)

   Registers a namespace prefix.  The registry is global, and any existing
   mapping for either the given prefix or the namespace URI will be removed.
   *prefix* is a namespace prefix.  *uri* is a namespace uri.  Tags and
   attributes in this namespace will be serialized with the given prefix, if at
   all possible.

Ezio Melotti's avatar
Ezio Melotti committed
565
   .. versionadded:: 3.2
566 567


568
.. function:: SubElement(parent, tag, attrib={}, **extra)
569

570 571 572 573 574 575 576 577 578 579
   Subelement factory.  This function creates an element instance, and appends
   it to an existing element.

   The element name, attribute names, and attribute values can be either
   bytestrings or Unicode strings.  *parent* is the parent element.  *tag* is
   the subelement name.  *attrib* is an optional dictionary, containing element
   attributes.  *extra* contains additional attributes, given as keyword
   arguments.  Returns an element instance.


580
.. function:: tostring(element, encoding="us-ascii", method="xml", *, \
581
                       short_empty_elements=True)
582 583

   Generates a string representation of an XML element, including all
584
   subelements.  *element* is an :class:`Element` instance.  *encoding* [1]_ is
585
   the output encoding (default is US-ASCII).  Use ``encoding="unicode"`` to
586 587
   generate a Unicode string (otherwise, a bytestring is generated).  *method*
   is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
588
   *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
589
   Returns an (optionally) encoded string containing the XML data.
590

591 592
   .. versionadded:: 3.4
      The *short_empty_elements* parameter.
593

594

595
.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \
596
                           short_empty_elements=True)
597

598
   Generates a string representation of an XML element, including all
599
   subelements.  *element* is an :class:`Element` instance.  *encoding* [1]_ is
600
   the output encoding (default is US-ASCII).  Use ``encoding="unicode"`` to
601 602
   generate a Unicode string (otherwise, a bytestring is generated).  *method*
   is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
603
   *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
604 605
   Returns a list of (optionally) encoded strings containing the XML data.
   It does not guarantee any specific sequence, except that
606
   ``b"".join(tostringlist(element)) == tostring(element)``.
607

Ezio Melotti's avatar
Ezio Melotti committed
608
   .. versionadded:: 3.2
609

610 611 612
   .. versionadded:: 3.4
      The *short_empty_elements* parameter.

613

614
.. function:: XML(text, parser=None)
615 616

   Parses an XML section from a string constant.  This function can be used to
617 618 619
   embed "XML literals" in Python code.  *text* is a string containing XML
   data.  *parser* is an optional parser instance.  If not given, the standard
   :class:`XMLParser` parser is used.  Returns an :class:`Element` instance.
620 621


622
.. function:: XMLID(text, parser=None)
623 624

   Parses an XML section from a string constant, and also returns a dictionary
625 626 627 628 629 630 631 632 633
   which maps from element id:s to elements.  *text* is a string containing XML
   data.  *parser* is an optional parser instance.  If not given, the standard
   :class:`XMLParser` parser is used.  Returns a tuple containing an
   :class:`Element` instance and a dictionary.


.. _elementtree-element-objects:

Element Objects
634
^^^^^^^^^^^^^^^
635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653

.. class:: Element(tag, attrib={}, **extra)

   Element class.  This class defines the Element interface, and provides a
   reference implementation of this interface.

   The element name, attribute names, and attribute values can be either
   bytestrings or Unicode strings.  *tag* is the element name.  *attrib* is
   an optional dictionary, containing element attributes.  *extra* contains
   additional attributes, given as keyword arguments.


   .. attribute:: tag

      A string identifying what kind of data this element represents (the
      element type, in other words).


   .. attribute:: text
654
                  tail
655

656 657 658 659 660 661 662
      These attributes can be used to hold additional data associated with
      the element.  Their values are usually strings but may be any
      application-specific object.  If the element is created from
      an XML file, the *text* attribute holds either the text between
      the element's start tag and its first child or end tag, or ``None``, and
      the *tail* attribute holds either the text between the element's
      end tag and the next tag, or ``None``.  For the XML data
663

664
      .. code-block:: xml
665

666
         <a><b>1<c>2<d/>3</c></b>4</a>
667

668 669 670 671 672 673 674 675 676
      the *a* element has ``None`` for both *text* and *tail* attributes,
      the *b* element has *text* ``"1"`` and *tail* ``"4"``,
      the *c* element has *text* ``"2"`` and *tail* ``None``,
      and the *d* element has *text* ``None`` and *tail* ``"3"``.

      To collect the inner text of an element, see :meth:`itertext`, for
      example ``"".join(element.itertext())``.

      Applications may store arbitrary objects in these attributes.
677 678


679
   .. attribute:: attrib
680

681 682 683 684 685
      A dictionary containing the element's attributes.  Note that while the
      *attrib* value is always a real mutable Python dictionary, an ElementTree
      implementation may choose to use another internal representation, and
      create the dictionary only if someone asks for it.  To take advantage of
      such implementations, use the dictionary methods below whenever possible.
686

687
   The following dictionary-like methods work on the element attributes.
688 689


690
   .. method:: clear()
691

692
      Resets an element.  This function removes all subelements, clears all
693
      attributes, and sets the text and tail attributes to ``None``.
694 695


696
   .. method:: get(key, default=None)
697

698
      Gets the element attribute named *key*.
699

700
      Returns the attribute value, or *default* if the attribute was not found.
701 702


703
   .. method:: items()
704

705 706
      Returns the element attributes as a sequence of (name, value) pairs.  The
      attributes are returned in an arbitrary order.
707 708


709
   .. method:: keys()
710

711 712
      Returns the elements attribute names as a list.  The names are returned
      in an arbitrary order.
713 714


715
   .. method:: set(key, value)
716

717
      Set the attribute *key* on the element to *value*.
718

719
   The following methods work on the element's children (subelements).
720 721


722
   .. method:: append(subelement)
723

724 725 726
      Adds the element *subelement* to the end of this element's internal list
      of subelements.  Raises :exc:`TypeError` if *subelement* is not an
      :class:`Element`.
727 728


729
   .. method:: extend(subelements)
730

731
      Appends *subelements* from a sequence object with zero or more elements.
732
      Raises :exc:`TypeError` if a subelement is not an :class:`Element`.
733

Ezio Melotti's avatar
Ezio Melotti committed
734
      .. versionadded:: 3.2
735 736


737
   .. method:: find(match, namespaces=None)
738

739
      Finds the first subelement matching *match*.  *match* may be a tag name
740
      or a :ref:`path <elementtree-xpath>`.  Returns an element instance
741 742
      or ``None``.  *namespaces* is an optional mapping from namespace prefix
      to full name.
743 744


745
   .. method:: findall(match, namespaces=None)
746

747 748
      Finds all matching subelements, by tag name or
      :ref:`path <elementtree-xpath>`.  Returns a list containing all matching
749 750
      elements in document order.  *namespaces* is an optional mapping from
      namespace prefix to full name.
751 752


753
   .. method:: findtext(match, default=None, namespaces=None)
754

755
      Finds text for the first subelement matching *match*.  *match* may be
756 757 758
      a tag name or a :ref:`path <elementtree-xpath>`.  Returns the text content
      of the first matching element, or *default* if no element was found.
      Note that if the matching element has no text content an empty string
759 760
      is returned. *namespaces* is an optional mapping from namespace prefix
      to full name.
761 762


763
   .. method:: getchildren()
764

765
      .. deprecated:: 3.2
766
         Use ``list(elem)`` or iteration.
767 768


769 770
   .. method:: getiterator(tag=None)

771
      .. deprecated:: 3.2
772 773 774
         Use method :meth:`Element.iter` instead.


775
   .. method:: insert(index, subelement)
776

777 778
      Inserts *subelement* at the given position in this element.  Raises
      :exc:`TypeError` if *subelement* is not an :class:`Element`.
779

780

781
   .. method:: iter(tag=None)
782

783 784 785 786 787
      Creates a tree :term:`iterator` with the current element as the root.
      The iterator iterates over this element and all elements below it, in
      document (depth first) order.  If *tag* is not ``None`` or ``'*'``, only
      elements whose tag equals *tag* are returned from the iterator.  If the
      tree structure is modified during iteration, the result is undefined.
788

789 790
      .. versionadded:: 3.2

791

792
   .. method:: iterfind(match, namespaces=None)
793

794 795
      Finds all matching subelements, by tag name or
      :ref:`path <elementtree-xpath>`.  Returns an iterable yielding all
796 797 798
      matching elements in document order. *namespaces* is an optional mapping
      from namespace prefix to full name.

799

Ezio Melotti's avatar
Ezio Melotti committed
800
      .. versionadded:: 3.2
801 802


803
   .. method:: itertext()
804

805 806
      Creates a text iterator.  The iterator loops over this element and all
      subelements, in document order, and returns all inner text.
807

Ezio Melotti's avatar
Ezio Melotti committed
808
      .. versionadded:: 3.2
809 810


811
   .. method:: makeelement(tag, attrib)
812

813 814
      Creates a new element object of the same type as this element.  Do not
      call this method, use the :func:`SubElement` factory function instead.
815 816


817
   .. method:: remove(subelement)
818

819 820 821
      Removes *subelement* from the element.  Unlike the find\* methods this
      method compares elements based on the instance identity, not on tag value
      or contents.
822

823
   :class:`Element` objects also support the following sequence type methods
824 825 826
   for working with subelements: :meth:`~object.__delitem__`,
   :meth:`~object.__getitem__`, :meth:`~object.__setitem__`,
   :meth:`~object.__len__`.
827

828 829 830
   Caution: Elements with no subelements will test as ``False``.  This behavior
   will change in future versions.  Use specific ``len(elem)`` or ``elem is
   None`` test instead. ::
831

832
     element = root.find('foo')
833

834 835 836 837 838
     if not element:  # careful!
         print("element not found, or element has no subelements")

     if element is None:
         print("element not found")
839 840 841 842 843


.. _elementtree-elementtree-objects:

ElementTree Objects
844
^^^^^^^^^^^^^^^^^^^
845 846


847
.. class:: ElementTree(element=None, file=None)
848

849 850 851
   ElementTree wrapper class.  This class represents an entire element
   hierarchy, and adds some extra support for serialization to and from
   standard XML.
852

853 854
   *element* is the root element.  The tree is initialized with the contents
   of the XML *file* if given.
855 856


857
   .. method:: _setroot(element)
858

859 860
      Replaces the root element for this tree.  This discards the current
      contents of the tree, and replaces it with the given element.  Use with
861
      care.  *element* is an element instance.
862 863


864
   .. method:: find(match, namespaces=None)
865

866
      Same as :meth:`Element.find`, starting at the root of the tree.
867 868


869
   .. method:: findall(match, namespaces=None)
870

871
      Same as :meth:`Element.findall`, starting at the root of the tree.
872 873


874
   .. method:: findtext(match, default=None, namespaces=None)
875

876
      Same as :meth:`Element.findtext`, starting at the root of the tree.
877 878


879
   .. method:: getiterator(tag=None)
880

881
      .. deprecated:: 3.2
882 883 884 885
         Use method :meth:`ElementTree.iter` instead.


   .. method:: getroot()
886

887 888 889 890 891
      Returns the root element for this tree.


   .. method:: iter(tag=None)

892
      Creates and returns a tree iterator for the root element.  The iterator
893
      loops over all elements in this tree, in section order.  *tag* is the tag
894
      to look for (default is to return all elements).
895 896


897
   .. method:: iterfind(match, namespaces=None)
898

899
      Same as :meth:`Element.iterfind`, starting at the root of the tree.
900

Ezio Melotti's avatar
Ezio Melotti committed
901
      .. versionadded:: 3.2
902 903


904
   .. method:: parse(source, parser=None)
905

906
      Loads an external XML section into this element tree.  *source* is a file
907
      name or :term:`file object`.  *parser* is an optional parser instance.
908 909
      If not given, the standard :class:`XMLParser` parser is used.  Returns the
      section root element.
910 911


912
   .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
913
                     default_namespace=None, method="xml", *, \
914
                     short_empty_elements=True)
915

916
      Writes the element tree to a file, as XML.  *file* is a file name, or a
917 918 919 920 921
      :term:`file object` opened for writing.  *encoding* [1]_ is the output
      encoding (default is US-ASCII).
      *xml_declaration* controls if an XML declaration should be added to the
      file.  Use ``False`` for never, ``True`` for always, ``None``
      for only if not US-ASCII or UTF-8 or Unicode (default is ``None``).
922
      *default_namespace* sets the default XML namespace (for "xmlns").
923 924
      *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
      ``"xml"``).
925 926 927 928
      The keyword-only *short_empty_elements* parameter controls the formatting
      of elements that contain no content.  If *True* (the default), they are
      emitted as a single self-closed tag, otherwise they are emitted as a pair
      of start/end tags.
929 930 931 932 933 934 935 936

      The output is either a string (:class:`str`) or binary (:class:`bytes`).
      This is controlled by the *encoding* argument.  If *encoding* is
      ``"unicode"``, the output is a string; otherwise, it's binary.  Note that
      this may conflict with the type of *file* if it's an open
      :term:`file object`; make sure you do not try to write a string to a
      binary stream and vice versa.

937 938
      .. versionadded:: 3.4
         The *short_empty_elements* parameter.
939

940

941 942 943 944 945 946 947
This is the XML file that is going to be manipulated::

    <html>
        <head>
            <title>Example page</title>
        </head>
        <body>
Georg Brandl's avatar
Georg Brandl committed
948
            <p>Moved to <a href="http://example.org/">example.org</a>
949 950 951 952 953 954 955 956 957
            or <a href="http://example.com/">example.com</a>.</p>
        </body>
    </html>

Example of changing the attribute "target" of every link in first paragraph::

    >>> from xml.etree.ElementTree import ElementTree
    >>> tree = ElementTree()
    >>> tree.parse("index.xhtml")
958
    <Element 'html' at 0xb77e6fac>
959 960
    >>> p = tree.find("body/p")     # Finds first occurrence of tag p in body
    >>> p
961 962
    <Element 'p' at 0xb77ec26c>
    >>> links = list(p.iter("a"))   # Returns list of all links
963
    >>> links
964
    [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
965 966 967
    >>> for i in links:             # Iterates through all found links
    ...     i.attrib["target"] = "blank"
    >>> tree.write("output.xhtml")
968 969 970 971

.. _elementtree-qname-objects:

QName Objects
972
^^^^^^^^^^^^^
973 974


975
.. class:: QName(text_or_uri, tag=None)
976

977 978 979 980
   QName wrapper.  This can be used to wrap a QName attribute value, in order
   to get proper namespace handling on output.  *text_or_uri* is a string
   containing the QName value, in the form {uri}local, or, if the tag argument
   is given, the URI part of a QName.  If *tag* is given, the first argument is
981
   interpreted as a URI, and this argument is interpreted as a local name.
982
   :class:`QName` instances are opaque.
983 984


985

986 987 988
.. _elementtree-treebuilder-objects:

TreeBuilder Objects
989
^^^^^^^^^^^^^^^^^^^
990 991


992
.. class:: TreeBuilder(element_factory=None)
993

994 995 996
   Generic element structure builder.  This builder converts a sequence of
   start, data, and end method calls to a well-formed element structure.  You
   can use this class to build an element structure using a custom XML parser,
997 998 999
   or a parser for some other XML-like format.  *element_factory*, when given,
   must be a callable accepting two positional arguments: a tag and
   a dict of attributes.  It is expected to return a new element instance.
1000

1001
   .. method:: close()
1002

1003 1004
      Flushes the builder buffers, and returns the toplevel document
      element.  Returns an :class:`Element` instance.
1005 1006


1007
   .. method:: data(data)
1008

1009 1010
      Adds text to the current element.  *data* is a string.  This should be
      either a bytestring, or a Unicode string.
1011 1012


1013
   .. method:: end(tag)
1014

1015 1016
      Closes the current element.  *tag* is the element name.  Returns the
      closed element.
1017 1018


1019
   .. method:: start(tag, attrs)
1020

1021 1022 1023
      Opens a new element.  *tag* is the element name.  *attrs* is a dictionary
      containing element attributes.  Returns the opened element.

1024

1025 1026
   In addition, a custom :class:`TreeBuilder` object can provide the
   following method:
1027

1028 1029 1030 1031 1032 1033
   .. method:: doctype(name, pubid, system)

      Handles a doctype declaration.  *name* is the doctype name.  *pubid* is
      the public identifier.  *system* is the system identifier.  This method
      does not exist on the default :class:`TreeBuilder` class.

Ezio Melotti's avatar
Ezio Melotti committed
1034
      .. versionadded:: 3.2
1035 1036


1037
.. _elementtree-xmlparser-objects:
1038

1039
XMLParser Objects
1040
^^^^^^^^^^^^^^^^^
1041

1042 1043 1044

.. class:: XMLParser(html=0, target=None, encoding=None)

1045 1046
   This class is the low-level building block of the module.  It uses
   :mod:`xml.parsers.expat` for efficient, event-based parsing of XML.  It can
Georg Brandl's avatar
Georg Brandl committed
1047 1048 1049 1050 1051 1052
   be fed XML data incrementally with the :meth:`feed` method, and parsing
   events are translated to a push API - by invoking callbacks on the *target*
   object.  If *target* is omitted, the standard :class:`TreeBuilder` is used.
   The *html* argument was historically used for backwards compatibility and is
   now deprecated.  If *encoding* [1]_ is given, the value overrides the
   encoding specified in the XML file.
1053

1054
   .. deprecated:: 3.4
1055
      The *html* argument.  The remaining arguments should be passed via
Georg Brandl's avatar
Georg Brandl committed
1056
      keyword to prepare for the removal of the *html* argument.
1057

1058
   .. method:: close()
1059

1060
      Finishes feeding data to the parser.  Returns the result of calling the
1061 1062
      ``close()`` method of the *target* passed during construction; by default,
      this is the toplevel document element.
1063 1064


1065
   .. method:: doctype(name, pubid, system)
1066

1067
      .. deprecated:: 3.2
1068 1069
         Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
         target.
1070 1071


1072
   .. method:: feed(data)
1073

1074
      Feeds data to the parser.  *data* is encoded data.
1075

1076 1077 1078 1079 1080 1081
   :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
   for each opening tag, its ``end(tag)`` method for each closing tag, and data
   is processed by method ``data(data)``.  :meth:`XMLParser.close` calls
   *target*\'s method ``close()``. :class:`XMLParser` can be used not only for
   building a tree structure. This is an example of counting the maximum depth
   of an XML file::
1082

1083
    >>> from xml.etree.ElementTree import XMLParser
1084 1085 1086 1087
    >>> class MaxDepth:                     # The target object of the parser
    ...     maxDepth = 0
    ...     depth = 0
    ...     def start(self, tag, attrib):   # Called for each opening tag.
Georg Brandl's avatar
Georg Brandl committed
1088
    ...         self.depth += 1
1089 1090 1091 1092
    ...         if self.depth > self.maxDepth:
    ...             self.maxDepth = self.depth
    ...     def end(self, tag):             # Called for each closing tag.
    ...         self.depth -= 1
Georg Brandl's avatar
Georg Brandl committed
1093
    ...     def data(self, data):
1094 1095 1096
    ...         pass            # We do not need to do anything with data.
    ...     def close(self):    # Called when all data has been parsed.
    ...         return self.maxDepth
Georg Brandl's avatar
Georg Brandl committed
1097
    ...
1098
    >>> target = MaxDepth()
1099
    >>> parser = XMLParser(target=target)
1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113
    >>> exampleXml = """
    ... <a>
    ...   <b>
    ...   </b>
    ...   <b>
    ...     <c>
    ...       <d>
    ...       </d>
    ...     </c>
    ...   </b>
    ... </a>"""
    >>> parser.feed(exampleXml)
    >>> parser.close()
    4
Christian Heimes's avatar
Christian Heimes committed
1114

1115 1116 1117 1118 1119 1120 1121 1122

.. _elementtree-xmlpullparser-objects:

XMLPullParser Objects
^^^^^^^^^^^^^^^^^^^^^

.. class:: XMLPullParser(events=None)

1123 1124 1125 1126 1127 1128 1129 1130
   A pull parser suitable for non-blocking applications.  Its input-side API is
   similar to that of :class:`XMLParser`, but instead of pushing calls to a
   callback target, :class:`XMLPullParser` collects an internal list of parsing
   events and lets the user read from it. *events* is a sequence of events to
   report back.  The supported events are the strings ``"start"``, ``"end"``,
   ``"start-ns"`` and ``"end-ns"`` (the "ns" events are used to get detailed
   namespace information).  If *events* is omitted, only ``"end"`` events are
   reported.
1131 1132 1133 1134 1135 1136 1137

   .. method:: feed(data)

      Feed the given bytes data to the parser.

   .. method:: close()

1138 1139 1140 1141
      Signal the parser that the data stream is terminated. Unlike
      :meth:`XMLParser.close`, this method always returns :const:`None`.
      Any events not yet retrieved when the parser is closed can still be
      read with :meth:`read_events`.
1142 1143 1144

   .. method:: read_events()

1145 1146 1147
      Return an iterator over the events which have been encountered in the
      data fed to the
      parser.  The iterator yields ``(event, elem)`` pairs, where *event* is a
1148
      string representing the type of event (e.g. ``"end"``) and *elem* is the
1149 1150 1151
      encountered :class:`Element` object.

      Events provided in a previous call to :meth:`read_events` will not be
1152 1153 1154 1155
      yielded again.  Events are consumed from the internal queue only when
      they are retrieved from the iterator, so multiple readers iterating in
      parallel over iterators obtained from :meth:`read_events` will have
      unpredictable results.
1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168

   .. note::

      :class:`XMLPullParser` only guarantees that it has seen the ">"
      character of a starting tag when it emits a "start" event, so the
      attributes are defined, but the contents of the text and tail attributes
      are undefined at that point.  The same applies to the element children;
      they may or may not be present.

      If you need a fully populated element, look for "end" events instead.

   .. versionadded:: 3.4

1169
Exceptions
1170
^^^^^^^^^^
1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186

.. class:: ParseError

   XML parse error, raised by the various parsing methods in this module when
   parsing fails.  The string representation of an instance of this exception
   will contain a user-friendly error message.  In addition, it will have
   the following attributes available:

   .. attribute:: code

      A numeric error code from the expat parser. See the documentation of
      :mod:`xml.parsers.expat` for the list of error codes and their meanings.

   .. attribute:: position

      A tuple of *line*, *column* numbers, specifying where the error occurred.
Christian Heimes's avatar
Christian Heimes committed
1187 1188 1189 1190

.. rubric:: Footnotes

.. [#] The encoding string included in XML output should conform to the
1191
   appropriate standards.  For example, "UTF-8" is valid, but "UTF8" is
1192 1193
   not.  See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
   and https://www.iana.org/assignments/character-sets/character-sets.xhtml.