diff options
Diffstat (limited to 'Doc/library/xml.etree.elementtree.rst')
-rw-r--r-- | Doc/library/xml.etree.elementtree.rst | 107 |
1 files changed, 102 insertions, 5 deletions
diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index 45a0e7b..d0bfed0 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -105,6 +105,38 @@ Children are nested, and we can access specific child nodes by index:: >>> root[0][1].text '2008' +Incremental parsing +^^^^^^^^^^^^^^^^^^^ + +It's possible to parse XML incrementally (i.e. not the whole document at once). +The most powerful tool for doing this is :class:`IncrementalParser`. It does +not require a blocking read to obtain the XML data, and is instead fed with +data incrementally with :meth:`IncrementalParser.data_received` calls. To get +the parsed XML elements, call :meth:`IncrementalParser.events`. Here's an +example:: + + >>> incparser = ET.IncrementalParser(['start', 'end']) + >>> incparser.data_received('<mytag>sometext') + >>> list(incparser.events()) + [('start', <Element 'mytag' at 0x7fba3f2a8688>)] + >>> incparser.data_received(' more text</mytag>') + >>> for event, elem in incparser.events(): + ... print(event) + ... print(elem.tag, 'text=', elem.text) + ... + end + mytag text= sometext more text + +The obvious use case is applications that operate in an asynchronous fashion +where the XML data is being received from a socket or read incrementally from +some storage device. In such cases, blocking reads are unacceptable. + +Because it's so flexible, :class:`IncrementalParser` can be inconvenient +to use for simpler use-cases. If you don't mind your application blocking on +reading XML data but would still like to have incremental parsing capabilities, +take a look at :func:`iterparse`. It can be useful when you're reading a large +XML document and don't want to hold it wholly in memory. + Finding interesting elements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -379,7 +411,7 @@ Functions Parses an XML section into an element tree incrementally, and reports what's going on to the user. *source* is a filename or :term:`file object` - containing XML data. *events* is a list of events to report back. The + containing XML data. *events* is a sequence of events to report back. The supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` and ``"end-ns"`` (the "ns" events are used to get detailed namespace information). If *events* is omitted, only ``"end"`` events are reported. @@ -387,6 +419,11 @@ Functions :class:`XMLParser` parser is used. Returns an :term:`iterator` providing ``(event, elem)`` pairs. + Note that while :func:`iterparse` builds the tree incrementally, it issues + blocking reads on *source* (or the file it names). As such, it's unsuitable + for asynchronous applications where blocking reads can't be made. For fully + asynchronous parsing, see :class:`IncrementalParser`. + .. note:: :func:`iterparse` only guarantees that it has seen the ">" @@ -397,7 +434,6 @@ Functions If you need a fully populated element, look for "end" events instead. - .. function:: parse(source, parser=None) Parses an XML section into an element tree. *source* is a filename or file @@ -437,29 +473,39 @@ Functions arguments. Returns an element instance. -.. function:: tostring(element, encoding="us-ascii", method="xml") +.. function:: tostring(element, encoding="us-ascii", method="xml", *, \ + short_empty_elements=True) Generates a string representation of an XML element, including all subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to generate a Unicode string (otherwise, a bytestring is generated). *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). + *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`. Returns an (optionally) encoded string containing the XML data. + .. versionadded:: 3.4 + The *short_empty_elements* parameter. -.. function:: tostringlist(element, encoding="us-ascii", method="xml") + +.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \ + short_empty_elements=True) Generates a string representation of an XML element, including all subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to generate a Unicode string (otherwise, a bytestring is generated). *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). + *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`. Returns a list of (optionally) encoded strings containing the XML data. It does not guarantee any specific sequence, except that ``"".join(tostringlist(element)) == tostring(element)``. .. versionadded:: 3.2 + .. versionadded:: 3.4 + The *short_empty_elements* parameter. + .. function:: XML(text, parser=None) @@ -751,7 +797,8 @@ ElementTree Objects .. method:: write(file, encoding="us-ascii", xml_declaration=None, \ - default_namespace=None, method="xml") + default_namespace=None, method="xml", *, \ + short_empty_elements=True) Writes the element tree to a file, as XML. *file* is a file name, or a :term:`file object` opened for writing. *encoding* [1]_ is the output @@ -762,6 +809,10 @@ ElementTree Objects *default_namespace* sets the default XML namespace (for "xmlns"). *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). + The keyword-only *short_empty_elements* parameter controls the formatting + of elements that contain no content. If *True* (the default), they are + emitted as a single self-closed tag, otherwise they are emitted as a pair + of start/end tags. The output is either a string (:class:`str`) or binary (:class:`bytes`). This is controlled by the *encoding* argument. If *encoding* is @@ -770,6 +821,9 @@ ElementTree Objects :term:`file object`; make sure you do not try to write a string to a binary stream and vice versa. + .. versionadded:: 3.4 + The *short_empty_elements* parameter. + This is the XML file that is going to be manipulated:: @@ -815,6 +869,49 @@ QName Objects :class:`QName` instances are opaque. +IncrementalParser Objects +^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. class:: IncrementalParser(events=None, parser=None) + + An incremental, event-driven parser suitable for non-blocking applications. + *events* is a sequence of events to report back. The supported events are + the strings ``"start"``, ``"end"``, ``"start-ns"`` and ``"end-ns"`` (the "ns" + events are used to get detailed namespace information). If *events* is + omitted, only ``"end"`` events are reported. *parser* is an optional + parser instance. If not given, the standard :class:`XMLParser` parser is + used. + + .. method:: data_received(data) + + Feed the given bytes data to the incremental parser. + + .. method:: eof_received() + + Signal the incremental parser that the data stream is terminated. + + .. method:: events() + + Iterate over the events which have been encountered in the data fed + to the parser. This method yields ``(event, elem)`` pairs, where + *event* is a string representing the type of event (e.g. ``"end"``) + and *elem* is the encountered :class:`Element` object. Events + provided in a previous call to :meth:`events` will not be yielded + again. + + .. note:: + + :class:`IncrementalParser` only guarantees that it has seen the ">" + character of a starting tag when it emits a "start" event, so the + attributes are defined, but the contents of the text and tail attributes + are undefined at that point. The same applies to the element children; + they may or may not be present. + + If you need a fully populated element, look for "end" events instead. + + .. versionadded:: 3.4 + + .. _elementtree-treebuilder-objects: TreeBuilder Objects |