summaryrefslogtreecommitdiffstats
path: root/Doc/library/xml.dom.pulldom.rst
blob: d1df465a598e5324949a34a77baf79e01791cb10 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
:mod:`xml.dom.pulldom` --- Support for building partial DOM trees
=================================================================

.. module:: xml.dom.pulldom
   :synopsis: Support for building partial DOM trees from SAX events.

.. moduleauthor:: Paul Prescod <paul@prescod.net>

**Source code:** :source:`Lib/xml/dom/pulldom.py`

--------------

The :mod:`xml.dom.pulldom` module provides a "pull parser" which can also be
asked to produce DOM-accessible fragments of the document where necessary. The
basic concept involves pulling "events" from a stream of incoming XML and
processing them. In contrast to SAX which also employs an event-driven
processing model together with callbacks, the user of a pull parser is
responsible for explicitly pulling events from the stream, looping over those
events until either processing is finished or an error condition occurs.


.. warning::

   The :mod:`xml.dom.pulldom` module is not secure against
   maliciously constructed data.  If you need to parse untrusted or
   unauthenticated data see :ref:`xml-vulnerabilities`.

.. versionchanged:: 3.7.1

   The SAX parser no longer processes general external entities by default to
   increase security by default. To enable processing of external entities,
   pass a custom parser instance in::

      from xml.dom.pulldom import parse
      from xml.sax import make_parser
      from xml.sax.handler import feature_external_ges

      parser = make_parser()
      parser.setFeature(feature_external_ges, True)
      parse(filename, parser=parser)


Example::

   from xml.dom import pulldom

   doc = pulldom.parse('sales_items.xml')
   for event, node in doc:
       if event == pulldom.START_ELEMENT and node.tagName == 'item':
           if int(node.getAttribute('price')) > 50:
               doc.expandNode(node)
               print(node.toxml())

``event`` is a constant and can be one of:

* :data:`START_ELEMENT`
* :data:`END_ELEMENT`
* :data:`COMMENT`
* :data:`START_DOCUMENT`
* :data:`END_DOCUMENT`
* :data:`CHARACTERS`
* :data:`PROCESSING_INSTRUCTION`
* :data:`IGNORABLE_WHITESPACE`

``node`` is an object of type :class:`xml.dom.minidom.Document`,
:class:`xml.dom.minidom.Element` or :class:`xml.dom.minidom.Text`.

Since the document is treated as a "flat" stream of events, the document "tree"
is implicitly traversed and the desired elements are found regardless of their
depth in the tree. In other words, one does not need to consider hierarchical
issues such as recursive searching of the document nodes, although if the
context of elements were important, one would either need to maintain some
context-related state (i.e. remembering where one is in the document at any
given point) or to make use of the :func:`DOMEventStream.expandNode` method
and switch to DOM-related processing.


.. class:: PullDom(documentFactory=None)

   Subclass of :class:`xml.sax.handler.ContentHandler`.


.. class:: SAX2DOM(documentFactory=None)

   Subclass of :class:`xml.sax.handler.ContentHandler`.


.. function:: parse(stream_or_string, parser=None, bufsize=None)

   Return a :class:`DOMEventStream` from the given input. *stream_or_string* may be
   either a file name, or a file-like object. *parser*, if given, must be an
   :class:`~xml.sax.xmlreader.XMLReader` object. This function will change the
   document handler of the
   parser and activate namespace support; other parser configuration (like
   setting an entity resolver) must have been done in advance.

If you have XML in a string, you can use the :func:`parseString` function instead:

.. function:: parseString(string, parser=None)

   Return a :class:`DOMEventStream` that represents the (Unicode) *string*.

.. data:: default_bufsize

   Default value for the *bufsize* parameter to :func:`parse`.

   The value of this variable can be changed before calling :func:`parse` and
   the new value will take effect.

.. _domeventstream-objects:

DOMEventStream Objects
----------------------

.. class:: DOMEventStream(stream, parser, bufsize)

   .. versionchanged:: 3.11
      Support for :meth:`__getitem__` method has been removed.

   .. method:: getEvent()

      Return a tuple containing *event* and the current *node* as
      :class:`xml.dom.minidom.Document` if event equals :data:`START_DOCUMENT`,
      :class:`xml.dom.minidom.Element` if event equals :data:`START_ELEMENT` or
      :data:`END_ELEMENT` or :class:`xml.dom.minidom.Text` if event equals
      :data:`CHARACTERS`.
      The current node does not contain information about its children, unless
      :func:`expandNode` is called.

   .. method:: expandNode(node)

      Expands all children of *node* into *node*. Example::

          from xml.dom import pulldom

          xml = '<html><title>Foo</title> <p>Some text <div>and more</div></p> </html>'
          doc = pulldom.parseString(xml)
          for event, node in doc:
              if event == pulldom.START_ELEMENT and node.tagName == 'p':
                  # Following statement only prints '<p/>'
                  print(node.toxml())
                  doc.expandNode(node)
                  # Following statement prints node with all its children '<p>Some text <div>and more</div></p>'
                  print(node.toxml())

   .. method:: DOMEventStream.reset()