diff options
Diffstat (limited to 'Doc/lib/xmldomminidom.tex')
-rw-r--r-- | Doc/lib/xmldomminidom.tex | 285 |
1 files changed, 0 insertions, 285 deletions
diff --git a/Doc/lib/xmldomminidom.tex b/Doc/lib/xmldomminidom.tex deleted file mode 100644 index 093915f..0000000 --- a/Doc/lib/xmldomminidom.tex +++ /dev/null @@ -1,285 +0,0 @@ -\section{\module{xml.dom.minidom} --- - Lightweight DOM implementation} - -\declaremodule{standard}{xml.dom.minidom} -\modulesynopsis{Lightweight Document Object Model (DOM) implementation.} -\moduleauthor{Paul Prescod}{paul@prescod.net} -\sectionauthor{Paul Prescod}{paul@prescod.net} -\sectionauthor{Martin v. L\"owis}{martin@v.loewis.de} - -\versionadded{2.0} - -\module{xml.dom.minidom} is a light-weight implementation of the -Document Object Model interface. It is intended to be -simpler than the full DOM and also significantly smaller. - -DOM applications typically start by parsing some XML into a DOM. With -\module{xml.dom.minidom}, this is done through the parse functions: - -\begin{verbatim} -from xml.dom.minidom import parse, parseString - -dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name - -datasource = open('c:\\temp\\mydata.xml') -dom2 = parse(datasource) # parse an open file - -dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>') -\end{verbatim} - -The \function{parse()} function can take either a filename or an open -file object. - -\begin{funcdesc}{parse}{filename_or_file{, parser}} - Return a \class{Document} from the given input. \var{filename_or_file} - may be either a file name, or a file-like object. \var{parser}, if - given, must be a SAX2 parser object. This function will change the - document handler of the parser and activate namespace support; other - parser configuration (like setting an entity resolver) must have been - done in advance. -\end{funcdesc} - -If you have XML in a string, you can use the -\function{parseString()} function instead: - -\begin{funcdesc}{parseString}{string\optional{, parser}} - Return a \class{Document} that represents the \var{string}. This - method creates a \class{StringIO} object for the string and passes - that on to \function{parse}. -\end{funcdesc} - -Both functions return a \class{Document} object representing the -content of the document. - -What the \function{parse()} and \function{parseString()} functions do -is connect an XML parser with a ``DOM builder'' that can accept parse -events from any SAX parser and convert them into a DOM tree. The name -of the functions are perhaps misleading, but are easy to grasp when -learning the interfaces. The parsing of the document will be -completed before these functions return; it's simply that these -functions do not provide a parser implementation themselves. - -You can also create a \class{Document} by calling a method on a ``DOM -Implementation'' object. You can get this object either by calling -the \function{getDOMImplementation()} function in the -\refmodule{xml.dom} package or the \module{xml.dom.minidom} module. -Using the implementation from the \module{xml.dom.minidom} module will -always return a \class{Document} instance from the minidom -implementation, while the version from \refmodule{xml.dom} may provide -an alternate implementation (this is likely if you have the -\ulink{PyXML package}{http://pyxml.sourceforge.net/} installed). Once -you have a \class{Document}, you can add child nodes to it to populate -the DOM: - -\begin{verbatim} -from xml.dom.minidom import getDOMImplementation - -impl = getDOMImplementation() - -newdoc = impl.createDocument(None, "some_tag", None) -top_element = newdoc.documentElement -text = newdoc.createTextNode('Some textual content.') -top_element.appendChild(text) -\end{verbatim} - -Once you have a DOM document object, you can access the parts of your -XML document through its properties and methods. These properties are -defined in the DOM specification. The main property of the document -object is the \member{documentElement} property. It gives you the -main element in the XML document: the one that holds all others. Here -is an example program: - -\begin{verbatim} -dom3 = parseString("<myxml>Some data</myxml>") -assert dom3.documentElement.tagName == "myxml" -\end{verbatim} - -When you are finished with a DOM, you should clean it up. This is -necessary because some versions of Python do not support garbage -collection of objects that refer to each other in a cycle. Until this -restriction is removed from all versions of Python, it is safest to -write your code as if cycles would not be cleaned up. - -The way to clean up a DOM is to call its \method{unlink()} method: - -\begin{verbatim} -dom1.unlink() -dom2.unlink() -dom3.unlink() -\end{verbatim} - -\method{unlink()} is a \module{xml.dom.minidom}-specific extension to -the DOM API. After calling \method{unlink()} on a node, the node and -its descendants are essentially useless. - -\begin{seealso} - \seetitle[http://www.w3.org/TR/REC-DOM-Level-1/]{Document Object - Model (DOM) Level 1 Specification} - {The W3C recommendation for the - DOM supported by \module{xml.dom.minidom}.} -\end{seealso} - - -\subsection{DOM Objects \label{dom-objects}} - -The definition of the DOM API for Python is given as part of the -\refmodule{xml.dom} module documentation. This section lists the -differences between the API and \refmodule{xml.dom.minidom}. - - -\begin{methoddesc}[Node]{unlink}{} -Break internal references within the DOM so that it will be garbage -collected on versions of Python without cyclic GC. Even when cyclic -GC is available, using this can make large amounts of memory available -sooner, so calling this on DOM objects as soon as they are no longer -needed is good practice. This only needs to be called on the -\class{Document} object, but may be called on child nodes to discard -children of that node. -\end{methoddesc} - -\begin{methoddesc}[Node]{writexml}{writer\optional{,indent=""\optional{,addindent=""\optional{,newl=""}}}} -Write XML to the writer object. The writer should have a -\method{write()} method which matches that of the file object -interface. The \var{indent} parameter is the indentation of the current -node. The \var{addindent} parameter is the incremental indentation to use -for subnodes of the current one. The \var{newl} parameter specifies the -string to use to terminate newlines. - -\versionchanged[The optional keyword parameters -\var{indent}, \var{addindent}, and \var{newl} were added to support pretty -output]{2.1} - -\versionchanged[For the \class{Document} node, an additional keyword -argument \var{encoding} can be used to specify the encoding field of the XML -header]{2.3} -\end{methoddesc} - -\begin{methoddesc}[Node]{toxml}{\optional{encoding}} -Return the XML that the DOM represents as a string. - -With no argument, the XML header does not specify an encoding, and the -result is Unicode string if the default encoding cannot represent all -characters in the document. Encoding this string in an encoding other -than UTF-8 is likely incorrect, since UTF-8 is the default encoding of -XML. - -With an explicit \var{encoding} argument, the result is a byte string -in the specified encoding. It is recommended that this argument is -always specified. To avoid \exception{UnicodeError} exceptions in case of -unrepresentable text data, the encoding argument should be specified -as "utf-8". - -\versionchanged[the \var{encoding} argument was introduced]{2.3} -\end{methoddesc} - -\begin{methoddesc}[Node]{toprettyxml}{\optional{indent\optional{, newl}}} -Return a pretty-printed version of the document. \var{indent} specifies -the indentation string and defaults to a tabulator; \var{newl} specifies -the string emitted at the end of each line and defaults to \code{\e n}. - -\versionadded{2.1} -\versionchanged[the encoding argument; see \method{toxml()}]{2.3} -\end{methoddesc} - -The following standard DOM methods have special considerations with -\refmodule{xml.dom.minidom}: - -\begin{methoddesc}[Node]{cloneNode}{deep} -Although this method was present in the version of -\refmodule{xml.dom.minidom} packaged with Python 2.0, it was seriously -broken. This has been corrected for subsequent releases. -\end{methoddesc} - - -\subsection{DOM Example \label{dom-example}} - -This example program is a fairly realistic example of a simple -program. In this particular case, we do not take much advantage -of the flexibility of the DOM. - -\verbatiminput{minidom-example.py} - - -\subsection{minidom and the DOM standard \label{minidom-and-dom}} - -The \refmodule{xml.dom.minidom} module is essentially a DOM -1.0-compatible DOM with some DOM 2 features (primarily namespace -features). - -Usage of the DOM interface in Python is straight-forward. The -following mapping rules apply: - -\begin{itemize} -\item Interfaces are accessed through instance objects. Applications - should not instantiate the classes themselves; they should use - the creator functions available on the \class{Document} object. - Derived interfaces support all operations (and attributes) from - the base interfaces, plus any new operations. - -\item Operations are used as methods. Since the DOM uses only - \keyword{in} parameters, the arguments are passed in normal - order (from left to right). There are no optional - arguments. \keyword{void} operations return \code{None}. - -\item IDL attributes map to instance attributes. For compatibility - with the OMG IDL language mapping for Python, an attribute - \code{foo} can also be accessed through accessor methods - \method{_get_foo()} and \method{_set_foo()}. \keyword{readonly} - attributes must not be changed; this is not enforced at - runtime. - -\item The types \code{short int}, \code{unsigned int}, \code{unsigned - long long}, and \code{boolean} all map to Python integer - objects. - -\item The type \code{DOMString} maps to Python strings. - \refmodule{xml.dom.minidom} supports either byte or Unicode - strings, but will normally produce Unicode strings. Values - of type \code{DOMString} may also be \code{None} where allowed - to have the IDL \code{null} value by the DOM specification from - the W3C. - -\item \keyword{const} declarations map to variables in their - respective scope - (e.g. \code{xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE}); - they must not be changed. - -\item \code{DOMException} is currently not supported in - \refmodule{xml.dom.minidom}. Instead, - \refmodule{xml.dom.minidom} uses standard Python exceptions such - as \exception{TypeError} and \exception{AttributeError}. - -\item \class{NodeList} objects are implemented using Python's built-in - list type. Starting with Python 2.2, these objects provide the - interface defined in the DOM specification, but with earlier - versions of Python they do not support the official API. They - are, however, much more ``Pythonic'' than the interface defined - in the W3C recommendations. -\end{itemize} - - -The following interfaces have no implementation in -\refmodule{xml.dom.minidom}: - -\begin{itemize} -\item \class{DOMTimeStamp} - -\item \class{DocumentType} (added in Python 2.1) - -\item \class{DOMImplementation} (added in Python 2.1) - -\item \class{CharacterData} - -\item \class{CDATASection} - -\item \class{Notation} - -\item \class{Entity} - -\item \class{EntityReference} - -\item \class{DocumentFragment} -\end{itemize} - -Most of these reflect information in the XML document that is not of -general utility to most DOM users. |