summaryrefslogtreecommitdiffstats
path: root/Doc/library/xml.dom.minidom.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/xml.dom.minidom.rst')
-rw-r--r--Doc/library/xml.dom.minidom.rst267
1 files changed, 267 insertions, 0 deletions
diff --git a/Doc/library/xml.dom.minidom.rst b/Doc/library/xml.dom.minidom.rst
new file mode 100644
index 0000000..54c5f3d
--- /dev/null
+++ b/Doc/library/xml.dom.minidom.rst
@@ -0,0 +1,267 @@
+
+:mod:`xml.dom.minidom` --- Lightweight DOM implementation
+=========================================================
+
+.. module:: xml.dom.minidom
+ :synopsis: Lightweight Document Object Model (DOM) implementation.
+.. moduleauthor:: Paul Prescod <paul@prescod.net>
+.. sectionauthor:: Paul Prescod <paul@prescod.net>
+.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de>
+
+
+.. versionadded:: 2.0
+
+:mod:`xml.dom.minidom` is a light-weight implementation of the Document Object
+Model interface. It is intended to be simpler than the full DOM and also
+significantly smaller.
+
+DOM applications typically start by parsing some XML into a DOM. With
+:mod:`xml.dom.minidom`, this is done through the parse functions::
+
+ from xml.dom.minidom import parse, parseString
+
+ dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name
+
+ datasource = open('c:\\temp\\mydata.xml')
+ dom2 = parse(datasource) # parse an open file
+
+ dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>')
+
+The :func:`parse` function can take either a filename or an open file object.
+
+
+.. function:: parse(filename_or_file, parser)
+
+ Return a :class:`Document` from the given input. *filename_or_file* may be
+ either a file name, or a file-like object. *parser*, if given, must be a SAX2
+ parser object. This function will change the document handler of the parser and
+ activate namespace support; other parser configuration (like setting an entity
+ resolver) must have been done in advance.
+
+If you have XML in a string, you can use the :func:`parseString` function
+instead:
+
+
+.. function:: parseString(string[, parser])
+
+ Return a :class:`Document` that represents the *string*. This method creates a
+ :class:`StringIO` object for the string and passes that on to :func:`parse`.
+
+Both functions return a :class:`Document` object representing the content of the
+document.
+
+What the :func:`parse` and :func:`parseString` functions do is connect an XML
+parser with a "DOM builder" that can accept parse events from any SAX parser and
+convert them into a DOM tree. The name of the functions are perhaps misleading,
+but are easy to grasp when learning the interfaces. The parsing of the document
+will be completed before these functions return; it's simply that these
+functions do not provide a parser implementation themselves.
+
+You can also create a :class:`Document` by calling a method on a "DOM
+Implementation" object. You can get this object either by calling the
+:func:`getDOMImplementation` function in the :mod:`xml.dom` package or the
+:mod:`xml.dom.minidom` module. Using the implementation from the
+:mod:`xml.dom.minidom` module will always return a :class:`Document` instance
+from the minidom implementation, while the version from :mod:`xml.dom` may
+provide an alternate implementation (this is likely if you have the `PyXML
+package <http://pyxml.sourceforge.net/>`_ installed). Once you have a
+:class:`Document`, you can add child nodes to it to populate the DOM::
+
+ from xml.dom.minidom import getDOMImplementation
+
+ impl = getDOMImplementation()
+
+ newdoc = impl.createDocument(None, "some_tag", None)
+ top_element = newdoc.documentElement
+ text = newdoc.createTextNode('Some textual content.')
+ top_element.appendChild(text)
+
+Once you have a DOM document object, you can access the parts of your XML
+document through its properties and methods. These properties are defined in
+the DOM specification. The main property of the document object is the
+:attr:`documentElement` property. It gives you the main element in the XML
+document: the one that holds all others. Here is an example program::
+
+ dom3 = parseString("<myxml>Some data</myxml>")
+ assert dom3.documentElement.tagName == "myxml"
+
+When you are finished with a DOM, you should clean it up. This is necessary
+because some versions of Python do not support garbage collection of objects
+that refer to each other in a cycle. Until this restriction is removed from all
+versions of Python, it is safest to write your code as if cycles would not be
+cleaned up.
+
+The way to clean up a DOM is to call its :meth:`unlink` method::
+
+ dom1.unlink()
+ dom2.unlink()
+ dom3.unlink()
+
+:meth:`unlink` is a :mod:`xml.dom.minidom`\ -specific extension to the DOM API.
+After calling :meth:`unlink` on a node, the node and its descendants are
+essentially useless.
+
+
+.. seealso::
+
+ `Document Object Model (DOM) Level 1 Specification <http://www.w3.org/TR/REC-DOM-Level-1/>`_
+ The W3C recommendation for the DOM supported by :mod:`xml.dom.minidom`.
+
+
+.. _minidom-objects:
+
+DOM Objects
+-----------
+
+The definition of the DOM API for Python is given as part of the :mod:`xml.dom`
+module documentation. This section lists the differences between the API and
+:mod:`xml.dom.minidom`.
+
+
+.. method:: Node.unlink()
+
+ Break internal references within the DOM so that it will be garbage collected on
+ versions of Python without cyclic GC. Even when cyclic GC is available, using
+ this can make large amounts of memory available sooner, so calling this on DOM
+ objects as soon as they are no longer needed is good practice. This only needs
+ to be called on the :class:`Document` object, but may be called on child nodes
+ to discard children of that node.
+
+
+.. method:: Node.writexml(writer[,indent=""[,addindent=""[,newl=""]]])
+
+ Write XML to the writer object. The writer should have a :meth:`write` method
+ which matches that of the file object interface. The *indent* parameter is the
+ indentation of the current node. The *addindent* parameter is the incremental
+ indentation to use for subnodes of the current one. The *newl* parameter
+ specifies the string to use to terminate newlines.
+
+ .. versionchanged:: 2.1
+ The optional keyword parameters *indent*, *addindent*, and *newl* were added to
+ support pretty output.
+
+ .. versionchanged:: 2.3
+ For the :class:`Document` node, an additional keyword argument *encoding* can be
+ used to specify the encoding field of the XML header.
+
+
+.. method:: Node.toxml([encoding])
+
+ Return the XML that the DOM represents as a string.
+
+ With no argument, the XML header does not specify an encoding, and the result is
+ Unicode string if the default encoding cannot represent all characters in the
+ document. Encoding this string in an encoding other than UTF-8 is likely
+ incorrect, since UTF-8 is the default encoding of XML.
+
+ With an explicit *encoding* argument, the result is a byte string in the
+ specified encoding. It is recommended that this argument is always specified. To
+ avoid :exc:`UnicodeError` exceptions in case of unrepresentable text data, the
+ encoding argument should be specified as "utf-8".
+
+ .. versionchanged:: 2.3
+ the *encoding* argument was introduced.
+
+
+.. method:: Node.toprettyxml([indent[, newl]])
+
+ Return a pretty-printed version of the document. *indent* specifies the
+ indentation string and defaults to a tabulator; *newl* specifies the string
+ emitted at the end of each line and defaults to ``\n``.
+
+ .. versionadded:: 2.1
+
+ .. versionchanged:: 2.3
+ the encoding argument; see :meth:`toxml`.
+
+The following standard DOM methods have special considerations with
+:mod:`xml.dom.minidom`:
+
+
+.. method:: Node.cloneNode(deep)
+
+ Although this method was present in the version of :mod:`xml.dom.minidom`
+ packaged with Python 2.0, it was seriously broken. This has been corrected for
+ subsequent releases.
+
+
+.. _dom-example:
+
+DOM Example
+-----------
+
+This example program is a fairly realistic example of a simple program. In this
+particular case, we do not take much advantage of the flexibility of the DOM.
+
+.. literalinclude:: ../includes/minidom-example.py
+
+
+.. _minidom-and-dom:
+
+minidom and the DOM standard
+----------------------------
+
+The :mod:`xml.dom.minidom` module is essentially a DOM 1.0-compatible DOM with
+some DOM 2 features (primarily namespace features).
+
+Usage of the DOM interface in Python is straight-forward. The following mapping
+rules apply:
+
+* Interfaces are accessed through instance objects. Applications should not
+ instantiate the classes themselves; they should use the creator functions
+ available on the :class:`Document` object. Derived interfaces support all
+ operations (and attributes) from the base interfaces, plus any new operations.
+
+* Operations are used as methods. Since the DOM uses only :keyword:`in`
+ parameters, the arguments are passed in normal order (from left to right).
+ There are no optional arguments. :keyword:`void` operations return ``None``.
+
+* IDL attributes map to instance attributes. For compatibility with the OMG IDL
+ language mapping for Python, an attribute ``foo`` can also be accessed through
+ accessor methods :meth:`_get_foo` and :meth:`_set_foo`. :keyword:`readonly`
+ attributes must not be changed; this is not enforced at runtime.
+
+* The types ``short int``, ``unsigned int``, ``unsigned long long``, and
+ ``boolean`` all map to Python integer objects.
+
+* The type ``DOMString`` maps to Python strings. :mod:`xml.dom.minidom` supports
+ either byte or Unicode strings, but will normally produce Unicode strings.
+ Values of type ``DOMString`` may also be ``None`` where allowed to have the IDL
+ ``null`` value by the DOM specification from the W3C.
+
+* :keyword:`const` declarations map to variables in their respective scope (e.g.
+ ``xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE``); they must not be changed.
+
+* ``DOMException`` is currently not supported in :mod:`xml.dom.minidom`.
+ Instead, :mod:`xml.dom.minidom` uses standard Python exceptions such as
+ :exc:`TypeError` and :exc:`AttributeError`.
+
+* :class:`NodeList` objects are implemented using Python's built-in list type.
+ Starting with Python 2.2, these objects provide the interface defined in the DOM
+ specification, but with earlier versions of Python they do not support the
+ official API. They are, however, much more "Pythonic" than the interface
+ defined in the W3C recommendations.
+
+The following interfaces have no implementation in :mod:`xml.dom.minidom`:
+
+* :class:`DOMTimeStamp`
+
+* :class:`DocumentType` (added in Python 2.1)
+
+* :class:`DOMImplementation` (added in Python 2.1)
+
+* :class:`CharacterData`
+
+* :class:`CDATASection`
+
+* :class:`Notation`
+
+* :class:`Entity`
+
+* :class:`EntityReference`
+
+* :class:`DocumentFragment`
+
+Most of these reflect information in the XML document that is not of general
+utility to most DOM users.
+