summaryrefslogtreecommitdiffstats
path: root/Doc/library/xml.etree.elementtree.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/xml.etree.elementtree.rst')
-rw-r--r--Doc/library/xml.etree.elementtree.rst485
1 files changed, 396 insertions, 89 deletions
diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst
index cf0c33f..144e344 100644
--- a/Doc/library/xml.etree.elementtree.rst
+++ b/Doc/library/xml.etree.elementtree.rst
@@ -5,49 +5,323 @@
:synopsis: Implementation of the ElementTree API.
.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
-**Source code:** :source:`Lib/xml/etree/ElementTree.py`
+The :mod:`xml.etree.ElementTree` module implements a simple and efficient API
+for parsing and creating XML data.
+
+.. versionchanged:: 3.3
+ This module will use a fast implementation whenever available.
+ The :mod:`xml.etree.cElementTree` module is deprecated.
+
+Tutorial
+--------
+
+This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
+short). The goal is to demonstrate some of the building blocks and basic
+concepts of the module.
+
+XML tree and elements
+^^^^^^^^^^^^^^^^^^^^^
+
+XML is an inherently hierarchical data format, and the most natural way to
+represent it is with a tree. ``ET`` has two classes for this purpose -
+:class:`ElementTree` represents the whole XML document as a tree, and
+:class:`Element` represents a single node in this tree. Interactions with
+the whole document (reading and writing to/from files) are usually done
+on the :class:`ElementTree` level. Interactions with a single XML element
+and its sub-elements are done on the :class:`Element` level.
+
+.. _elementtree-parsing-xml:
+
+Parsing XML
+^^^^^^^^^^^
+
+We'll be using the following XML document as the sample data for this section:
+
+.. code-block:: xml
+
+ <?xml version="1.0"?>
+ <data>
+ <country name="Liechtenstein">
+ <rank>1</rank>
+ <year>2008</year>
+ <gdppc>141100</gdppc>
+ <neighbor name="Austria" direction="E"/>
+ <neighbor name="Switzerland" direction="W"/>
+ </country>
+ <country name="Singapore">
+ <rank>4</rank>
+ <year>2011</year>
+ <gdppc>59900</gdppc>
+ <neighbor name="Malaysia" direction="N"/>
+ </country>
+ <country name="Panama">
+ <rank>68</rank>
+ <year>2011</year>
+ <gdppc>13600</gdppc>
+ <neighbor name="Costa Rica" direction="W"/>
+ <neighbor name="Colombia" direction="E"/>
+ </country>
+ </data>
+
+We can import this data by reading from a file::
+
+ import xml.etree.ElementTree as ET
+ tree = ET.parse('country_data.xml')
+ root = tree.getroot()
+
+Or directly from a string::
+
+ root = ET.fromstring(country_data_as_string)
+
+:func:`fromstring` parses XML from a string directly into an :class:`Element`,
+which is the root element of the parsed tree. Other parsing functions may
+create an :class:`ElementTree`. Check the documentation to be sure.
+
+As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
+
+ >>> root.tag
+ 'data'
+ >>> root.attrib
+ {}
+
+It also has children nodes over which we can iterate::
+
+ >>> for child in root:
+ ... print(child.tag, child.attrib)
+ ...
+ country {'name': 'Liechtenstein'}
+ country {'name': 'Singapore'}
+ country {'name': 'Panama'}
+
+Children are nested, and we can access specific child nodes by index::
+
+ >>> root[0][1].text
+ '2008'
+
+Finding interesting elements
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+:class:`Element` has some useful methods that help iterate recursively over all
+the sub-tree below it (its children, their children, and so on). For example,
+:meth:`Element.iter`::
+
+ >>> for neighbor in root.iter('neighbor'):
+ ... print(neighbor.attrib)
+ ...
+ {'name': 'Austria', 'direction': 'E'}
+ {'name': 'Switzerland', 'direction': 'W'}
+ {'name': 'Malaysia', 'direction': 'N'}
+ {'name': 'Costa Rica', 'direction': 'W'}
+ {'name': 'Colombia', 'direction': 'E'}
+
+:meth:`Element.findall` finds only elements with a tag which are direct
+children of the current element. :meth:`Element.find` finds the *first* child
+with a particular tag, and :meth:`Element.text` accesses the element's text
+content. :meth:`Element.get` accesses the element's attributes::
+
+ >>> for country in root.findall('country'):
+ ... rank = country.find('rank').text
+ ... name = country.get('name')
+ ... print(name, rank)
+ ...
+ Liechtenstein 1
+ Singapore 4
+ Panama 68
+
+More sophisticated specification of which elements to look for is possible by
+using :ref:`XPath <elementtree-xpath>`.
+
+Modifying an XML File
+^^^^^^^^^^^^^^^^^^^^^
+
+:class:`ElementTree` provides a simple way to build XML documents and write them to files.
+The :meth:`ElementTree.write` method serves this purpose.
+
+Once created, an :class:`Element` object may be manipulated by directly changing
+its fields (such as :attr:`Element.text`), adding and modifying attributes
+(:meth:`Element.set` method), as well as adding new children (for example
+with :meth:`Element.append`).
+
+Let's say we want to add one to each country's rank, and add an ``updated``
+attribute to the rank element::
+
+ >>> for rank in root.iter('rank'):
+ ... new_rank = int(rank.text) + 1
+ ... rank.text = str(new_rank)
+ ... rank.set('updated', 'yes')
+ ...
+ >>> tree.write('output.xml')
+
+Our XML now looks like this:
+
+.. code-block:: xml
+
+ <?xml version="1.0"?>
+ <data>
+ <country name="Liechtenstein">
+ <rank updated="yes">2</rank>
+ <year>2008</year>
+ <gdppc>141100</gdppc>
+ <neighbor name="Austria" direction="E"/>
+ <neighbor name="Switzerland" direction="W"/>
+ </country>
+ <country name="Singapore">
+ <rank updated="yes">5</rank>
+ <year>2011</year>
+ <gdppc>59900</gdppc>
+ <neighbor name="Malaysia" direction="N"/>
+ </country>
+ <country name="Panama">
+ <rank updated="yes">69</rank>
+ <year>2011</year>
+ <gdppc>13600</gdppc>
+ <neighbor name="Costa Rica" direction="W"/>
+ <neighbor name="Colombia" direction="E"/>
+ </country>
+ </data>
+
+We can remove elements using :meth:`Element.remove`. Let's say we want to
+remove all countries with a rank higher than 50::
+
+ >>> for country in root.findall('country'):
+ ... rank = int(country.find('rank').text)
+ ... if rank > 50:
+ ... root.remove(country)
+ ...
+ >>> tree.write('output.xml')
+
+Our XML now looks like this:
+
+.. code-block:: xml
+
+ <?xml version="1.0"?>
+ <data>
+ <country name="Liechtenstein">
+ <rank updated="yes">2</rank>
+ <year>2008</year>
+ <gdppc>141100</gdppc>
+ <neighbor name="Austria" direction="E"/>
+ <neighbor name="Switzerland" direction="W"/>
+ </country>
+ <country name="Singapore">
+ <rank updated="yes">5</rank>
+ <year>2011</year>
+ <gdppc>59900</gdppc>
+ <neighbor name="Malaysia" direction="N"/>
+ </country>
+ </data>
+
+Building XML documents
+^^^^^^^^^^^^^^^^^^^^^^
+
+The :func:`SubElement` function also provides a convenient way to create new
+sub-elements for a given element::
+
+ >>> a = ET.Element('a')
+ >>> b = ET.SubElement(a, 'b')
+ >>> c = ET.SubElement(a, 'c')
+ >>> d = ET.SubElement(c, 'd')
+ >>> ET.dump(a)
+ <a><b /><c><d /></c></a>
+
+Additional resources
+^^^^^^^^^^^^^^^^^^^^
---------------
-
-The :class:`Element` type is a flexible container object, designed to store
-hierarchical data structures in memory. The type can be described as a cross
-between a list and a dictionary.
-
-Each element has a number of properties associated with it:
-
-* a tag which is a string identifying what kind of data this element represents
- (the element type, in other words).
-
-* a number of attributes, stored in a Python dictionary.
-
-* a text string.
-
-* an optional tail string.
-
-* a number of child elements, stored in a Python sequence
-
-To create an element instance, use the :class:`Element` constructor or the
-:func:`SubElement` factory function.
-
-The :class:`ElementTree` class can be used to wrap an element structure, and
-convert it from and to XML.
+See http://effbot.org/zone/element-index.htm for tutorials and links to other
+docs.
-A C implementation of this API is available as :mod:`xml.etree.cElementTree`.
-See http://effbot.org/zone/element-index.htm for tutorials and links to other
-docs. Fredrik Lundh's page is also the location of the development version of
-the xml.etree.ElementTree.
+.. _elementtree-xpath:
-.. versionchanged:: 3.2
- The ElementTree API is updated to 1.3. For more information, see
- `Introducing ElementTree 1.3
- <http://effbot.org/zone/elementtree-13-intro.htm>`_.
+XPath support
+-------------
+This module provides limited support for
+`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
+tree. The goal is to support a small subset of the abbreviated syntax; a full
+XPath engine is outside the scope of the module.
+
+Example
+^^^^^^^
+
+Here's an example that demonstrates some of the XPath capabilities of the
+module. We'll be using the ``countrydata`` XML document from the
+:ref:`Parsing XML <elementtree-parsing-xml>` section::
+
+ import xml.etree.ElementTree as ET
+
+ root = ET.fromstring(countrydata)
+
+ # Top-level elements
+ root.findall(".")
+
+ # All 'neighbor' grand-children of 'country' children of the top-level
+ # elements
+ root.findall("./country/neighbor")
+
+ # Nodes with name='Singapore' that have a 'year' child
+ root.findall(".//year/..[@name='Singapore']")
+
+ # 'year' nodes that are children of nodes with name='Singapore'
+ root.findall(".//*[@name='Singapore']/year")
+
+ # All 'neighbor' nodes that are the second child of their parent
+ root.findall(".//neighbor[2]")
+
+Supported XPath syntax
+^^^^^^^^^^^^^^^^^^^^^^
+
++-----------------------+------------------------------------------------------+
+| Syntax | Meaning |
++=======================+======================================================+
+| ``tag`` | Selects all child elements with the given tag. |
+| | For example, ``spam`` selects all child elements |
+| | named ``spam``, ``spam/egg`` selects all |
+| | grandchildren named ``egg`` in all children named |
+| | ``spam``. |
++-----------------------+------------------------------------------------------+
+| ``*`` | Selects all child elements. For example, ``*/egg`` |
+| | selects all grandchildren named ``egg``. |
++-----------------------+------------------------------------------------------+
+| ``.`` | Selects the current node. This is mostly useful |
+| | at the beginning of the path, to indicate that it's |
+| | a relative path. |
++-----------------------+------------------------------------------------------+
+| ``//`` | Selects all subelements, on all levels beneath the |
+| | current element. For example, ``.//egg`` selects |
+| | all ``egg`` elements in the entire tree. |
++-----------------------+------------------------------------------------------+
+| ``..`` | Selects the parent element. Returns ``None`` if the |
+| | path attempts to reach the ancestors of the start |
+| | element (the element ``find`` was called on). |
++-----------------------+------------------------------------------------------+
+| ``[@attrib]`` | Selects all elements that have the given attribute. |
++-----------------------+------------------------------------------------------+
+| ``[@attrib='value']`` | Selects all elements for which the given attribute |
+| | has the given value. The value cannot contain |
+| | quotes. |
++-----------------------+------------------------------------------------------+
+| ``[tag]`` | Selects all elements that have a child named |
+| | ``tag``. Only immediate children are supported. |
++-----------------------+------------------------------------------------------+
+| ``[position]`` | Selects all elements that are located at the given |
+| | position. The position can be either an integer |
+| | (1 is the first position), the expression ``last()`` |
+| | (for the last position), or a position relative to |
+| | the last position (e.g. ``last()-1``). |
++-----------------------+------------------------------------------------------+
+
+Predicates (expressions within square brackets) must be preceded by a tag
+name, an asterisk, or another predicate. ``position`` predicates must be
+preceded by a tag name.
+
+Reference
+---------
.. _elementtree-functions:
Functions
----------
+^^^^^^^^^
.. function:: Comment(text=None)
@@ -101,9 +375,8 @@ Functions
and ``"end-ns"`` (the "ns" events are used to get detailed namespace
information). If *events* is omitted, only ``"end"`` events are reported.
*parser* is an optional parser instance. If not given, the standard
- :class:`XMLParser` parser is used. *parser* is not supported by
- ``cElementTree``. Returns an :term:`iterator` providing ``(event, elem)``
- pairs.
+ :class:`XMLParser` parser is used. Returns an :term:`iterator` providing
+ ``(event, elem)`` pairs.
.. note::
@@ -160,9 +433,9 @@ Functions
Generates a string representation of an XML element, including all
subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
- generate a Unicode string. *method* is either ``"xml"``,
- ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an (optionally)
- encoded string containing the XML data.
+ generate a Unicode string (otherwise, a bytestring is generated). *method*
+ is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
+ Returns an (optionally) encoded string containing the XML data.
.. function:: tostringlist(element, encoding="us-ascii", method="xml")
@@ -170,11 +443,11 @@ Functions
Generates a string representation of an XML element, including all
subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
- generate a Unicode string. *method* is either ``"xml"``,
- ``"html"`` or ``"text"`` (default is ``"xml"``). Returns a list of
- (optionally) encoded strings containing the XML data. It does not guarantee
- any specific sequence, except that ``"".join(tostringlist(element)) ==
- tostring(element)``.
+ generate a Unicode string (otherwise, a bytestring is generated). *method*
+ is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
+ Returns a list of (optionally) encoded strings containing the XML data.
+ It does not guarantee any specific sequence, except that
+ ``"".join(tostringlist(element)) == tostring(element)``.
.. versionadded:: 3.2
@@ -199,8 +472,7 @@ Functions
.. _elementtree-element-objects:
Element Objects
----------------
-
+^^^^^^^^^^^^^^^
.. class:: Element(tag, attrib={}, **extra)
@@ -251,7 +523,7 @@ Element Objects
.. method:: clear()
Resets an element. This function removes all subelements, clears all
- attributes, and sets the text and tail attributes to None.
+ attributes, and sets the text and tail attributes to ``None``.
.. method:: get(key, default=None)
@@ -282,36 +554,43 @@ Element Objects
.. method:: append(subelement)
- Adds the element *subelement* to the end of this elements internal list
- of subelements.
+ Adds the element *subelement* to the end of this element's internal list
+ of subelements. Raises :exc:`TypeError` if *subelement* is not an
+ :class:`Element`.
.. method:: extend(subelements)
Appends *subelements* from a sequence object with zero or more elements.
- Raises :exc:`AssertionError` if a subelement is not a valid object.
+ Raises :exc:`TypeError` if a subelement is not an :class:`Element`.
.. versionadded:: 3.2
- .. method:: find(match)
+ .. method:: find(match, namespaces=None)
Finds the first subelement matching *match*. *match* may be a tag name
- or path. Returns an element instance or ``None``.
+ or a :ref:`path <elementtree-xpath>`. Returns an element instance
+ or ``None``. *namespaces* is an optional mapping from namespace prefix
+ to full name.
- .. method:: findall(match)
+ .. method:: findall(match, namespaces=None)
- Finds all matching subelements, by tag name or path. Returns a list
- containing all matching elements in document order.
+ Finds all matching subelements, by tag name or
+ :ref:`path <elementtree-xpath>`. Returns a list containing all matching
+ elements in document order. *namespaces* is an optional mapping from
+ namespace prefix to full name.
- .. method:: findtext(match, default=None)
+ .. method:: findtext(match, default=None, namespaces=None)
Finds text for the first subelement matching *match*. *match* may be
- a tag name or path. Returns the text content of the first matching
- element, or *default* if no element was found. Note that if the matching
- element has no text content an empty string is returned.
+ a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content
+ of the first matching element, or *default* if no element was found.
+ Note that if the matching element has no text content an empty string
+ is returned. *namespaces* is an optional mapping from namespace prefix
+ to full name.
.. method:: getchildren()
@@ -326,9 +605,10 @@ Element Objects
Use method :meth:`Element.iter` instead.
- .. method:: insert(index, element)
+ .. method:: insert(index, subelement)
- Inserts a subelement at the given position in this element.
+ Inserts *subelement* at the given position in this element. Raises
+ :exc:`TypeError` if *subelement* is not an :class:`Element`.
.. method:: iter(tag=None)
@@ -342,10 +622,13 @@ Element Objects
.. versionadded:: 3.2
- .. method:: iterfind(match)
+ .. method:: iterfind(match, namespaces=None)
+
+ Finds all matching subelements, by tag name or
+ :ref:`path <elementtree-xpath>`. Returns an iterable yielding all
+ matching elements in document order. *namespaces* is an optional mapping
+ from namespace prefix to full name.
- Finds all matching subelements, by tag name or path. Returns an iterable
- yielding all matching elements in document order.
.. versionadded:: 3.2
@@ -390,7 +673,7 @@ Element Objects
.. _elementtree-elementtree-objects:
ElementTree Objects
--------------------
+^^^^^^^^^^^^^^^^^^^
.. class:: ElementTree(element=None, file=None)
@@ -410,17 +693,17 @@ ElementTree Objects
care. *element* is an element instance.
- .. method:: find(match)
+ .. method:: find(match, namespaces=None)
Same as :meth:`Element.find`, starting at the root of the tree.
- .. method:: findall(match)
+ .. method:: findall(match, namespaces=None)
Same as :meth:`Element.findall`, starting at the root of the tree.
- .. method:: findtext(match, default=None)
+ .. method:: findtext(match, default=None, namespaces=None)
Same as :meth:`Element.findtext`, starting at the root of the tree.
@@ -443,11 +726,9 @@ ElementTree Objects
to look for (default is to return all elements)
- .. method:: iterfind(match)
+ .. method:: iterfind(match, namespaces=None)
- Finds all matching subelements, by tag name or path. Same as
- getroot().iterfind(match). Returns an iterable yielding all matching
- elements in document order.
+ Same as :meth:`Element.iterfind`, starting at the root of the tree.
.. versionadded:: 3.2
@@ -456,8 +737,8 @@ ElementTree Objects
Loads an external XML section into this element tree. *source* is a file
name or :term:`file object`. *parser* is an optional parser instance.
- If not given, the standard XMLParser parser is used. Returns the section
- root element.
+ If not given, the standard :class:`XMLParser` parser is used. Returns the
+ section root element.
.. method:: write(file, encoding="us-ascii", xml_declaration=None, \
@@ -465,13 +746,21 @@ ElementTree Objects
Writes the element tree to a file, as XML. *file* is a file name, or a
:term:`file object` opened for writing. *encoding* [1]_ is the output
- encoding (default is US-ASCII). Use ``encoding="unicode"`` to write a
- Unicode string. *xml_declaration* controls if an XML declaration
- should be added to the file. Use False for never, True for always, None
- for only if not US-ASCII or UTF-8 or Unicode (default is None).
+ encoding (default is US-ASCII).
+ *xml_declaration* controls if an XML declaration should be added to the
+ file. Use ``False`` for never, ``True`` for always, ``None``
+ for only if not US-ASCII or UTF-8 or Unicode (default is ``None``).
*default_namespace* sets the default XML namespace (for "xmlns").
*method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
- ``"xml"``). Returns an (optionally) encoded string.
+ ``"xml"``).
+
+ The output is either a string (:class:`str`) or binary (:class:`bytes`).
+ This is controlled by the *encoding* argument. If *encoding* is
+ ``"unicode"``, the output is a string; otherwise, it's binary. Note that
+ this may conflict with the type of *file* if it's an open
+ :term:`file object`; make sure you do not try to write a string to a
+ binary stream and vice versa.
+
This is the XML file that is going to be manipulated::
@@ -504,7 +793,7 @@ Example of changing the attribute "target" of every link in first paragraph::
.. _elementtree-qname-objects:
QName Objects
--------------
+^^^^^^^^^^^^^
.. class:: QName(text_or_uri, tag=None)
@@ -520,7 +809,7 @@ QName Objects
.. _elementtree-treebuilder-objects:
TreeBuilder Objects
--------------------
+^^^^^^^^^^^^^^^^^^^
.. class:: TreeBuilder(element_factory=None)
@@ -528,9 +817,9 @@ TreeBuilder Objects
Generic element structure builder. This builder converts a sequence of
start, data, and end method calls to a well-formed element structure. You
can use this class to build an element structure using a custom XML parser,
- or a parser for some other XML-like format. The *element_factory* is called
- to create new :class:`Element` instances when given.
-
+ or a parser for some other XML-like format. *element_factory*, when given,
+ must be a callable accepting two positional arguments: a tag and
+ a dict of attributes. It is expected to return a new element instance.
.. method:: close()
@@ -571,7 +860,7 @@ TreeBuilder Objects
.. _elementtree-xmlparser-objects:
XMLParser Objects
------------------
+^^^^^^^^^^^^^^^^^
.. class:: XMLParser(html=0, target=None, encoding=None)
@@ -579,9 +868,9 @@ XMLParser Objects
:class:`Element` structure builder for XML source data, based on the expat
parser. *html* are predefined HTML entities. This flag is not supported by
the current implementation. *target* is the target object. If omitted, the
- builder uses an instance of the standard TreeBuilder class. *encoding* [1]_
- is optional. If given, the value overrides the encoding specified in the
- XML file.
+ builder uses an instance of the standard :class:`TreeBuilder` class.
+ *encoding* [1]_ is optional. If given, the value overrides the encoding
+ specified in the XML file.
.. method:: close()
@@ -639,6 +928,24 @@ This is an example of counting the maximum depth of an XML file::
>>> parser.close()
4
+Exceptions
+^^^^^^^^^^
+
+.. class:: ParseError
+
+ XML parse error, raised by the various parsing methods in this module when
+ parsing fails. The string representation of an instance of this exception
+ will contain a user-friendly error message. In addition, it will have
+ the following attributes available:
+
+ .. attribute:: code
+
+ A numeric error code from the expat parser. See the documentation of
+ :mod:`xml.parsers.expat` for the list of error codes and their meanings.
+
+ .. attribute:: position
+
+ A tuple of *line*, *column* numbers, specifying where the error occurred.
.. rubric:: Footnotes