diff options
Diffstat (limited to 'Doc/library/xml.etree.elementtree.rst')
-rw-r--r-- | Doc/library/xml.etree.elementtree.rst | 169 |
1 files changed, 144 insertions, 25 deletions
diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index cf65219..4c89dc3 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -105,6 +105,43 @@ Children are nested, and we can access specific child nodes by index:: >>> root[0][1].text '2008' +Pull API for non-blocking parsing +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Most parsing functions provided by this module require to read the whole +document at once before returning any result. It is possible to use a +:class:`XMLParser` and feed data into it incrementally, but it's a push API that +calls methods on a callback target, which is too low-level and inconvenient for +most needs. Sometimes what the user really wants is to be able to parse XML +incrementally, without blocking operations, while enjoying the convenience of +fully constructed :class:`Element` objects. + +The most powerful tool for doing this is :class:`XMLPullParser`. It does not +require a blocking read to obtain the XML data, and is instead fed with data +incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML +elements, call :meth:`XMLPullParser.read_events`. Here's an example:: + + >>> parser = ET.XMLPullParser(['start', 'end']) + >>> parser.feed('<mytag>sometext') + >>> list(parser.read_events()) + [('start', <Element 'mytag' at 0x7fa66db2be58>)] + >>> parser.feed(' more text</mytag>') + >>> for event, elem in parser.read_events(): + ... print(event) + ... print(elem.tag, 'text=', elem.text) + ... + end + +The obvious use case is applications that operate in a non-blocking fashion +where the XML data is being received from a socket or read incrementally from +some storage device. In such cases, blocking reads are unacceptable. + +Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for +simpler use-cases. If you don't mind your application blocking on reading XML +data but would still like to have incremental parsing capabilities, take a look +at :func:`iterparse`. It can be useful when you're reading a large XML document +and don't want to hold it wholly in memory. + Finding interesting elements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -379,25 +416,32 @@ Functions Parses an XML section into an element tree incrementally, and reports what's going on to the user. *source* is a filename or :term:`file object` - containing XML data. *events* is a tuple of events to report back. The - supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` - and ``"end-ns"`` (the "ns" events are used to get detailed namespace + containing XML data. *events* is a sequence of events to report back. The + supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` and + ``"end-ns"`` (the "ns" events are used to get detailed namespace information). If *events* is omitted, only ``"end"`` events are reported. *parser* is an optional parser instance. If not given, the standard - :class:`XMLParser` parser is used. *parser* can only use the default - :class:`TreeBuilder` as a target. Returns an :term:`iterator` providing - ``(event, elem)`` pairs. + :class:`XMLParser` parser is used. *parser* must be a subclass of + :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a + target. Returns an :term:`iterator` providing ``(event, elem)`` pairs. + + Note that while :func:`iterparse` builds the tree incrementally, it issues + blocking reads on *source* (or the file it names). As such, it's unsuitable + for applications where blocking reads can't be made. For fully non-blocking + parsing, see :class:`XMLPullParser`. .. note:: - :func:`iterparse` only guarantees that it has seen the ">" - character of a starting tag when it emits a "start" event, so the - attributes are defined, but the contents of the text and tail attributes - are undefined at that point. The same applies to the element children; - they may or may not be present. + :func:`iterparse` only guarantees that it has seen the ">" character of a + starting tag when it emits a "start" event, so the attributes are defined, + but the contents of the text and tail attributes are undefined at that + point. The same applies to the element children; they may or may not be + present. If you need a fully populated element, look for "end" events instead. + .. deprecated:: 3.4 + The *parser* argument. .. function:: parse(source, parser=None) @@ -438,29 +482,39 @@ Functions arguments. Returns an element instance. -.. function:: tostring(element, encoding="us-ascii", method="xml") +.. function:: tostring(element, encoding="us-ascii", method="xml", *, \ + short_empty_elements=True) Generates a string representation of an XML element, including all subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to generate a Unicode string (otherwise, a bytestring is generated). *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). + *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`. Returns an (optionally) encoded string containing the XML data. + .. versionadded:: 3.4 + The *short_empty_elements* parameter. + -.. function:: tostringlist(element, encoding="us-ascii", method="xml") +.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \ + short_empty_elements=True) Generates a string representation of an XML element, including all subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to generate a Unicode string (otherwise, a bytestring is generated). *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). + *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`. Returns a list of (optionally) encoded strings containing the XML data. It does not guarantee any specific sequence, except that ``"".join(tostringlist(element)) == tostring(element)``. .. versionadded:: 3.2 + .. versionadded:: 3.4 + The *short_empty_elements* parameter. + .. function:: XML(text, parser=None) @@ -753,7 +807,8 @@ ElementTree Objects .. method:: write(file, encoding="us-ascii", xml_declaration=None, \ - default_namespace=None, method="xml") + default_namespace=None, method="xml", *, \ + short_empty_elements=True) Writes the element tree to a file, as XML. *file* is a file name, or a :term:`file object` opened for writing. *encoding* [1]_ is the output @@ -764,6 +819,10 @@ ElementTree Objects *default_namespace* sets the default XML namespace (for "xmlns"). *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). + The keyword-only *short_empty_elements* parameter controls the formatting + of elements that contain no content. If *True* (the default), they are + emitted as a single self-closed tag, otherwise they are emitted as a pair + of start/end tags. The output is either a string (:class:`str`) or binary (:class:`bytes`). This is controlled by the *encoding* argument. If *encoding* is @@ -772,6 +831,9 @@ ElementTree Objects :term:`file object`; make sure you do not try to write a string to a binary stream and vice versa. + .. versionadded:: 3.4 + The *short_empty_elements* parameter. + This is the XML file that is going to be manipulated:: @@ -817,6 +879,7 @@ QName Objects :class:`QName` instances are opaque. + .. _elementtree-treebuilder-objects: TreeBuilder Objects @@ -876,13 +939,17 @@ XMLParser Objects .. class:: XMLParser(html=0, target=None, encoding=None) - :class:`Element` structure builder for XML source data, based on the expat - parser. *html* are predefined HTML entities. This flag is not supported by - the current implementation. *target* is the target object. If omitted, the - builder uses an instance of the standard :class:`TreeBuilder` class. - *encoding* [1]_ is optional. If given, the value overrides the encoding + This class is the low-level building block of the module. It uses + :mod:`xml.parsers.expat` for efficient, event-based parsing of XML. It can + be fed XML data incrementall with the :meth:`feed` method, and parsing events + are translated to a push API - by invoking callbacks on the *target* object. + If *target* is omitted, the standard :class:`TreeBuilder` is used. The + *html* argument was historically used for backwards compatibility and is now + deprecated. If *encoding* [1]_ is given, the value overrides the encoding specified in the XML file. + .. deprecated:: 3.4 + The *html* argument. .. method:: close() @@ -902,12 +969,12 @@ XMLParser Objects Feeds data to the parser. *data* is encoded data. - :meth:`XMLParser.feed` calls *target*\'s ``start()`` method - for each opening tag, its ``end()`` method for each closing tag, - and data is processed by method ``data()``. :meth:`XMLParser.close` - calls *target*\'s method ``close()``. - :class:`XMLParser` can be used not only for building a tree structure. - This is an example of counting the maximum depth of an XML file:: + :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method + for each opening tag, its ``end(tag)`` method for each closing tag, and data + is processed by method ``data(data)``. :meth:`XMLParser.close` calls + *target*\'s method ``close()``. :class:`XMLParser` can be used not only for + building a tree structure. This is an example of counting the maximum depth + of an XML file:: >>> from xml.etree.ElementTree import XMLParser >>> class MaxDepth: # The target object of the parser @@ -941,6 +1008,58 @@ XMLParser Objects >>> parser.close() 4 + +.. _elementtree-xmlpullparser-objects: + +XMLPullParser Objects +^^^^^^^^^^^^^^^^^^^^^ + +.. class:: XMLPullParser(events=None) + + A pull parser suitable for non-blocking applications. Its input-side API is + similar to that of :class:`XMLParser`, but instead of pushing calls to a + callback target, :class:`XMLPullParser` collects an internal list of parsing + events and lets the user read from it. *events* is a sequence of events to + report back. The supported events are the strings ``"start"``, ``"end"``, + ``"start-ns"`` and ``"end-ns"`` (the "ns" events are used to get detailed + namespace information). If *events* is omitted, only ``"end"`` events are + reported. + + .. method:: feed(data) + + Feed the given bytes data to the parser. + + .. method:: close() + + Signal the parser that the data stream is terminated. Unlike + :meth:`XMLParser.close`, this method always returns :const:`None`. + Any events not yet retrieved when the parser is closed can still be + read with :meth:`read_events`. + + .. method:: read_events() + + Iterate over the events which have been encountered in the data fed to the + parser. This method yields ``(event, elem)`` pairs, where *event* is a + string representing the type of event (e.g. ``"end"``) and *elem* is the + encountered :class:`Element` object. + + Events provided in a previous call to :meth:`read_events` will not be + yielded again. As events are consumed from the internal queue only as + they are retrieved from the iterator, multiple readers calling + :meth:`read_events` in parallel will have unpredictable results. + + .. note:: + + :class:`XMLPullParser` only guarantees that it has seen the ">" + character of a starting tag when it emits a "start" event, so the + attributes are defined, but the contents of the text and tail attributes + are undefined at that point. The same applies to the element children; + they may or may not be present. + + If you need a fully populated element, look for "end" events instead. + + .. versionadded:: 3.4 + Exceptions ^^^^^^^^^^ |