Paul Prescod <paul@prescod.net>:

Documentation for the xml.dom.minidom module & Python DOM API. FLD: I have revised the markup in some places and added a few minor details to Paul's text, but that's it. Given the substantial structural differences with the bulk of the presentation, I will be making additional revisions over the next few days.
author: Fred Drake <fdrake@acm.org> 2000-10-24 02:34:45 (GMT)
committer: Fred Drake <fdrake@acm.org> 2000-10-24 02:34:45 (GMT)
commit: 669d36f02c6bae1fff38c767ee62a3c12fde43ff (patch)
tree: 7dde206803def7a7ef92ab319727b63987d0a4dc /Doc/lib
parent: f61eac425a1061654150c4687c94bc71c0f6b7a2 (diff)
download: cpython-669d36f02c6bae1fff38c767ee62a3c12fde43ff.zip
cpython-669d36f02c6bae1fff38c767ee62a3c12fde43ff.tar.gz
cpython-669d36f02c6bae1fff38c767ee62a3c12fde43ff.tar.bz2
1 files changed, 614 insertions, 0 deletions
diff --git a/Doc/lib/xmldom.tex b/Doc/lib/xmldom.tex
new file mode 100644
index 0000000..c2945a4
--- /dev/null
+++ b/Doc/lib/xmldom.tex
@@ -0,0 +1,614 @@
+\section{\module{xml.dom.minidom} ---
+         The Document Object Model}
+
+\declaremodule{standard}{xml.dom.minidom}
+\modulesynopsis{Lightweight Document Object Model (DOM) implementation.}
+\moduleauthor{Paul Prescod}{paul@prescod.net}
+\sectionauthor{Paul Prescod}{paul@prescod.net}
+\sectionauthor{Martin v. L\"owis}{loewis@informatik.hu-berlin.de}
+
+\versionadded{2.0}
+
+The \module{xml.dom.minidom} provides a light-weight implementation of
+the W3C Document Object Model. The DOM is a cross-language API from
+the Web Consortium (W3C) for accessing and modifying XML documents. A
+DOM implementation allows to convert an XML document into a tree-like
+structure, or to build such a structure from scratch.  It then gives
+access to the structure through a set of objects which provided
+well-known interfaces. Minidom is intended to be simpler than the full
+DOM and also significantly smaller.
+
+The DOM is extremely useful for random-access applications. SAX only
+allows you a view of one bit of the document at a time. If you are
+looking at one SAX element, you have no access to another. If you are
+looking at a text node, you have no access to a containing
+element. When you write a SAX application, you need to keep track of
+your program's position in the document somewhere in your own
+code. Sax does not do it for you. Also, if you need to look ahead in
+the XML document, you are just out of luck.
+
+Some applications are simply impossible in an event driven model with
+no access to a tree. Of course you could build some sort of tree
+yourself in SAX events, but the DOM allows you to avoid writing that
+code. The DOM is a standard tree representation for XML data.
+
+%What if your needs are somewhere between SAX and the DOM? Perhaps you cannot
+%afford to load the entire tree in memory but you find the SAX model 
+%somewhat cumbersome and low-level. There is also an experimental module
+%called pulldom that allows you to build trees of only the parts of a 
+%document that you need structured access to. It also has features that allow
+%you to find your way around the DOM. 
+% See http://www.prescod.net/python/pulldom
+
+DOM applications typically start by parsing some XML into a DOM. This
+is done through the parse functions:
+
+\begin{verbatim}
+from xml.dom.minidom import parse, parseString
+
+dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name
+
+datasource = open('c:\\temp\\mydata.xml')
+dom2 = parse(datasource)   # parse an open file
+
+dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>')
+\end{verbatim}
+
+The parse function can take either a filename or an open file object. 
+
+\begin{funcdesc}{parse}{filename_or_file{, parser}}
+  Return a \class{Document} from the given input. \var{filename_or_file}
+  may be either a file name, or a file-like object. \var{parser}, if
+  given, must be a SAX2 parser object. This function will change the
+  document handler of the parser and activate namespace support; other
+  parser configuration (like setting an entity resolver) must have been
+  done in advance.
+\end{funcdesc}
+
+If you have XML in a string, you can use the parseString function
+instead:
+
+\begin{funcdesc}{parseString}{string\optional{, parser}}
+  Return a \class{Document} that represents the \var{string}. This
+  method creates a \class{StringIO} object for the string and passes
+  that on to \function{parse}.
+\end{funcdesc}
+
+Both functions return a document object representing the content of
+the document.
+
+You can also create a document node merely by instantiating a 
+document object. Then you could add child nodes to it to populate
+the DOM.
+
+\begin{verbatim}
+from xml.dom.minidom import Document
+
+newdoc = Document()
+newel = newdoc.createElement("some_tag")
+newdoc.appendChild(newel)
+\end{verbatim}
+
+Once you have a DOM document object, you can access the parts of your
+XML document through its properties and methods.  These properties are
+defined in the DOM specification.  The main property of the document
+object is the documentElement property.  It gives you the main element
+in the XML document: the one that holds all others.  Here is an
+example program:
+
+\begin{verbatim}
+dom3 = parseString("<myxml>Some data</myxml>")
+assert dom3.documentElement.tagName == "myxml"
+\end{verbatim}
+
+When you are finished with a DOM, you should clean it up.  This is
+necessary because some versions of Python do not support garbage
+collection of objects that refer to each other in a cycle.  Until this
+restriction is removed from all versions of Python, it is safest to
+write your code as if cycles would not be cleaned up.
+
+The way to clean up a DOM is to call its \method{unlink()} method:
+
+\begin{verbatim}
+dom1.unlink()
+dom2.unlink()
+dom3.unlink()
+\end{verbatim}
+
+\method{unlink()} is a \module{minidom}-specific extension to the DOM
+API.  After calling \method{unlink()}, a DOM is basically useless.
+
+\begin{seealso}
+  \seetitle[http://www.w3.org/TR/REC-DOM-Level-1/]{DOM Specification}
+           {This is the canonical specification for the level of the
+            DOM supported by \module{xml.dom.minidom}.}
+  \seetitle[http://pyxml.sourceforge.net]{PyXML}{Users that require a
+            full-featured implementation of DOM should use the PyXML
+            package.}
+\end{seealso}
+
+
+\subsection{DOM objects \label{dom-objects}}
+
+The definitive documentation for the DOM is the DOM specification from
+the W3C.  This section lists the properties and methods supported by
+\refmodule{xml.dom.minidom}.
+
+\begin{classdesc}{Node}{}
+All of the components of an XML document are subclasses of
+\class{Node}.
+
+\begin{memberdesc}{nodeType}
+An integer representing the node type.  Symbolic constants for the
+types are on the \class{Node} object: \constant{DOCUMENT_NODE},
+\constant{ELEMENT_NODE}, \constant{ATTRIBUTE_NODE},
+\constant{TEXT_NODE}, \constant{CDATA_SECTION_NODE},
+\constant{ENTITY_NODE}, \constant{PROCESSING_INSTRUCTION_NODE},
+\constant{COMMENT_NODE}, \constant{DOCUMENT_NODE},
+\constant{DOCUMENT_TYPE_NODE}, \constant{NOTATION_NODE}.
+\end{memberdesc}
+
+\begin{memberdesc}{parentNode}
+The parent of the current node.  \code{None} for the document node.
+\end{memberdesc}
+
+\begin{memberdesc}{attributes}
+An \class{AttributeList} of attribute objects.  Only
+elements have this attribute.  Others return \code{None}.
+\end{memberdesc}
+
+\begin{memberdesc}{previousSibling}
+The node that immediately precedes this one with the same parent.  For
+instance the element with an end-tag that comes just before the
+\var{self} element's start-tag.  Of course, XML documents are made
+up of more than just elements so the previous sibling could be text, a
+comment, or something else.
+\end{memberdesc}
+
+\begin{memberdesc}{nextSibling}
+The node that immediately follows this one with the same parent.  See
+also \member{previousSibling}.
+\end{memberdesc}
+
+\begin{memberdesc}{childNodes}
+A list of nodes contained within this node.
+\end{memberdesc}
+
+\begin{memberdesc}{firstChild}
+Equivalent to \code{childNodes[0]}.
+\end{memberdesc}
+
+\begin{memberdesc}{lastChild}
+Equivalent to \code{childNodes[-1]}.
+\end{memberdesc}
+
+\begin{memberdesc}{nodeName}
+Has a different meaning for each node type.  See the DOM specification
+for details.  You can always get the information you would get here
+from another property such as the \member{tagName} property for
+elements or the \member{name} property for attributes.
+\end{memberdesc}
+
+\begin{memberdesc}{nodeValue}
+Has a different meaning for each node type.  See the DOM specification
+for details.  The situation is similar to that with \member{nodeName}.
+\end{memberdesc}
+
+\begin{methoddesc}{unlink}{}
+Break internal references within the DOM so that it will be garbage
+collected on versions of Python without cyclic GC.
+\end{methoddesc}
+
+\begin{methoddesc}{writexml}{writer}
+Write XML to the writer object.  The writer should have a
+\method{write()} method which matches that of the file object
+interface.
+\end{methoddesc}
+
+\begin{methoddesc}{toxml}{}
+Return the XML string that the DOM represents.
+\end{methoddesc}
+
+\begin{methoddesc}{hasChildNodes}{}
+Returns true the node has any child nodes.
+\end{methoddesc}
+
+\begin{methoddesc}{insertBefore}{newChild, refChild}
+Insert a new child node before an existing child.  It must be the case
+that \var{refChild} is a child of this node; if not,
+\exception{ValueError} is raised.
+\end{methoddesc}
+
+\begin{methoddesc}{replaceChild}{newChild, oldChild}
+Replace an existing node with a new node. It must be the case that 
+\var{oldChild} is a child of this node; if not,
+\exception{ValueError} is raised.
+\end{methoddesc}
+
+\begin{methoddesc}{removeChild}{oldChild}
+Remove a child node.  \var{oldChild} must be a child of this node; if
+not, \exception{ValueError} is raised.
+\end{methoddesc}
+
+\begin{methoddesc}{appendChild}{newChild}
+Add a new child node to this node list.
+\end{methoddesc}
+
+\begin{methoddesc}{cloneNode}{deep}
+Clone this node. Deep means to clone all children also. Deep cloning
+is not implemented in Python 2 so the deep parameter should always be
+0 for now.
+\end{methoddesc}
+
+\end{classdesc}
+
+
+\begin{classdesc}{Document}{}
+Represents an entire XML document, including its constituent elements,
+attributes, processing instructions, comments etc.  Remeber that it
+inherits properties from \class{Node}.
+
+\begin{memberdesc}{documentElement}
+The one and only root element of the document.
+\end{memberdesc}
+
+\begin{methoddesc}{createElement}{tagName}
+Create a new element.  The element is not inserted into the document
+when it is created.  You need to explicitly insert it with one of the
+other methods such as \method{insertBefore()} or
+\method{appendChild()}.
+\end{methoddesc}
+
+\begin{methoddesc}{createTextNode}{data}
+Create a text node containing the data passed as a parameter.  As with
+the other creation methods, this one does not insert the node into the
+tree.
+\end{methoddesc}
+
+\begin{methoddesc}{createComment}{data}
+Create a comment node containing the data passed as a parameter.  As
+with the other creation methods, this one does not insert the node
+into the tree.
+\end{methoddesc}
+
+\begin{methoddesc}{createProcessingInstruction}{target, data}
+Create a processing instruction node containing the \var{target} and
+\var{data} passed as parameters.  As with the other creation methods,
+this one does not insert the node into the tree.
+\end{methoddesc}
+
+\begin{methoddesc}{createAttribute}{name}
+Create an attribute node.  This method does not associate the
+attribute node with any particular element.  You must use
+\method{setAttributeNode()} on the appropriate \class{Element} object
+to use the newly created attribute instance.
+\end{methoddesc}
+
+\begin{methoddesc}{createElementNS}{namespaceURI, tagName}
+Create a new element with a namespace.  The \var{tagName} may have a
+prefix.  The element is not inserted into the document when it is
+created.  You need to explicitly insert it with one of the other
+methods such as \method{insertBefore()} or \method{appendChild()}.
+\end{methoddesc}
+
+
+\begin{methoddesc}{createAttributeNS}{namespaceURI, qualifiedName}
+Create an attribute node with a namespace.  The \var{tagName} may have
+a prefix.  This method does not associate the attribute node with any
+particular element. You must use \method{setAttributeNode()} on the
+appropriate \class{Element} object to use the newly created attribute
+instance.
+\end{methoddesc}
+
+\begin{methoddesc}{getElementsByTagName}{tagName}
+Search for all descendants (direct children, children's children,
+etc.) with a particular element type name.
+\end{methoddesc}
+
+\begin{methoddesc}{getElementsByTagNameNS}{namespaceURI, localName}
+Search for all descendants (direct children, children's children,
+etc.) with a particular namespace URI and localname.  The localname is
+the part of the namespace after the prefix.
+\end{methoddesc}
+
+\end{classdesc}
+
+
+\begin{classdesc}{Element}{}
+\begin{memberdesc}{tagName}
+The element type name.  In a namespace-using document it may have
+colons in it.
+\end{memberdesc}
+
+\begin{memberdesc}{localName}
+The part of the \member{tagName} following the colon if there is one,
+else the entire \member{tagName}.
+\end{memberdesc}
+
+\begin{memberdesc}{prefix}
+The part of the \member{tagName} preceding the colon if there is one,
+else the empty string.
+\end{memberdesc}
+
+\begin{memberdesc}{namespaceURI}
+The namespace associated with the tagName.
+\end{memberdesc}
+
+\begin{methoddesc}{getAttribute}{attname}
+Return an attribute value as a string.
+\end{methoddesc}
+
+\begin{methoddesc}{setAttribute}{attname, value}
+Set an attribute value from a string.
+\end{methoddesc}
+
+\begin{methoddesc}{removeAttribute}{attname}
+Remove an attribute by name.
+\end{methoddesc}
+
+\begin{methoddesc}{getAttributeNS}{namespaceURI, localName}
+Return an attribute value as a string, given a \var{namespaceURI} and
+\var{localName}.  Note that a localname is the part of a prefixed
+attribute name after the colon (if there is one).
+\end{methoddesc}
+
+\begin{methoddesc}{setAttributeNS}{namespaceURI, qname, value}
+Set an attribute value from a string, given a \var{namespaceURI} and a
+\var{qname}.  Note that a qname is the whole attribute name.  This is
+different than above.
+\end{methoddesc}
+
+\begin{methoddesc}{removeAttributeNS}{namespaceURI, localName}
+Remove an attribute by name.  Note that it uses a localName, not a
+qname.
+\end{methoddesc}
+
+\begin{methoddesc}{getElementsByTagName}{tagName}
+Same as equivalent method in the \class{Document} class.
+\end{methoddesc}
+
+\begin{methoddesc}{getElementsByTagNameNS}{tagName}
+Same as equivalent method in the \class{Document} class.
+\end{methoddesc}
+
+\end{classdesc}
+
+
+\begin{classdesc}{Attribute}{}
+
+\begin{memberdesc}{name}
+The attribute name.  In a namespace-using document it may have colons
+in it.
+\end{memberdesc}
+
+\begin{memberdesc}{localName}
+The part of the name following the colon if there is one, else the
+entire name.
+\end{memberdesc}
+
+\begin{memberdesc}{prefix}
+The part of the name preceding the colon if there is one, else the
+empty string.
+\end{memberdesc}
+
+\begin{memberdesc}{namespaceURI}
+The namespace associated with the attribute name.
+\end{memberdesc}
+
+\end{classdesc}
+
+
+\begin{classdesc}{AttributeList}{}
+
+\begin{memberdesc}{length}
+The length of the attribute list.
+\end{memberdesc}
+
+\begin{methoddesc}{item}{index}
+Return an attribute with a particular index.  The order you get the
+attributes in is arbitrary but will be consistent for the life of a
+DOM.  Each item is an attribute node.  Get its value with the
+\member{value} attribbute.
+\end{methoddesc}
+
+There are also experimental methods that give this class more
+dictionary-like behavior. You can use them or you can use the
+standardized \method{getAttribute*()}-family methods.
+
+\end{classdesc}
+
+
+\begin{classdesc}{Comment}{}
+Represents a comment in the XML document.
+
+\begin{memberdesc}{data}
+The content of the comment.
+\end{memberdesc}
+\end{classdesc}
+
+
+\begin{classdesc}{Text}{}
+Represents text in the XML document.
+
+\begin{memberdesc}{data}
+The content of the text node.
+\end{memberdesc}
+\end{classdesc}
+
+
+\begin{classdesc}{ProcessingInstruction}{}
+Represents a processing instruction in the XML document.
+
+\begin{memberdesc}{target}
+The content of the processing instruction up to the first whitespace
+character.
+\end{memberdesc}
+
+\begin{memberdesc}{data}
+The content of the processing instruction following the first
+whitespace character.
+\end{memberdesc}
+\end{classdesc}
+
+Note that DOM attributes may also be manipulated as nodes instead of as
+simple strings. It is fairly rare that you must do this, however, so this
+usage is not yet documented here.
+
+
+\begin{seealso}
+  \seetitle[http://www.w3.org/TR/REC-DOM-Level-1/]{DOM Specification}
+           {This is the canonical specification for the level of the
+            DOM supported by \module{xml.dom.minidom}.}
+\end{seealso}
+
+
+\subsection{DOM Example \label{dom-example}}
+
+This example program is a fairly realistic example of a simple
+program. In this particular case, we do not take much advantage
+of the flexibility of the DOM.
+
+\begin{verbatim}
+from xml.dom.minidom import parse, parseString
+
+document="""
+<slideshow>
+<title>Demo slideshow</title>
+<slide><title>Slide title</title>
+<point>This is a demo</point>
+<point>Of a program for processing slides</point>
+</slide>
+
+<slide><title>Another demo slide</title>
+<point>It is important</point>
+<point>To have more than</point>
+<point>one slide</point>
+</slide>
+</slideshow>
+"""
+
+dom = parseString(document)
+
+space=" "
+def getText(nodelist):
+    rc=""
+    for node in nodelist:
+        if node.nodeType==node.TEXT_NODE:
+            rc=rc+node.data
+    return rc
+
+def handleSlideshow(slideshow):
+    print "<html>"
+    handleSlideshowTitle(slideshow.getElementsByTagName("title")[0])
+    slides = slideshow.getElementsByTagName("slide")
+    handleToc(slides)
+    handleSlides(slides)
+    print "</html>"
+
+def handleSlides(slides):
+    for slide in slides:
+       handleSlide(slide)
+
+def handleSlide(slide):
+    handleSlideTitle(slide.getElementsByTagName("title")[0])
+    handlePoints(slide.getElementsByTagName("point"))
+    
+def handleSlideshowTitle(title):
+    print "<title>%s</title>"%getText(title.childNodes)
+
+def handleSlideTitle(title):
+    print "<h2>%s</h2>"%getText(title.childNodes)
+
+def handlePoints(points):
+    print "<ul>"
+    for point in points:
+        handlePoint(point)
+    print "</ul>"
+
+def handlePoint(point):
+    print "<li>%s</li>"%getText(point.childNodes)
+
+def handleToc(slides):
+    for slide in slides:
+        title = slide.getElementsByTagName("title")[0]
+        print "<p>%s</p>"%getText(title.childNodes)
+    
+handleSlideshow(dom)
+\end{verbatim}
+
+\subsection{minidom and the DOM standard \label{minidom-and-dom}}
+
+Minidom is basically a DOM 1.0-compatible DOM with some DOM 2 features
+(primarily namespace features). 
+
+Usage of the other DOM interfaces in Python is straight-forward. The
+following mapping rules apply:
+
+\begin{itemize}
+
+\item Interfaces are accessed through instance objects. Applications
+should
+not instantiate the classes themselves; they should use the creator
+functions. Derived interfaces support all operations (and attributes) 
+from the base interfaces, plus any new operations.
+
+\item Operations are used as methods. Since the DOM uses only
+\code{in} 
+parameters, the arguments are passed in normal order (from left to
+right). 
+There are no optional arguments. \code{void} operations return
+\code{None}.
+
+\item IDL attributes map to instance attributes. For compatibility
+with
+the OMG IDL language mapping for Python, an attribute \code{foo} can
+also be accessed through accessor functions \code{_get_foo} and
+\code{_set_foo}. \code{readonly} attributes must not be changed.
+
+\item The types \code{short int},\code{unsigned int},\code{unsigned
+long long},
+and \code{boolean} all map to Python integer objects.
+
+\item The type \code{DOMString} maps to Python strings. \code{minidom}
+supports either byte or Unicode strings, but will normally produce
+Unicode
+strings. Attributes of type \code{DOMString} may also be \code{None}.
+
+\item \code{const} declarations map to variables in their respective
+scope
+(e.g. \code{xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE}); they
+must
+not be changed.
+
+\item \code{DOMException} is currently not supported in
+\module{minidom}. Instead, minidom returns standard Python exceptions
+such as TypeError and AttributeError.
+
+\end{itemize}
+
+The following interfaces have no equivalent in minidom:
+
+\begin{itemize}
+
+\item DOMTimeStamp
+
+\item DocumentType
+
+\item DOMImplementation
+
+\item CharacterData
+
+\item CDATASection
+
+\item Notation
+
+\item Entity
+
+\item EntityReference
+
+\item DocumentFragment
+
+\end{itemize}
+
+Most of these reflect information in the XML document that is not of
+general utility to most DOM users.
author	Fred Drake <fdrake@acm.org>	2000-10-24 02:34:45 (GMT)
committer	Fred Drake <fdrake@acm.org>	2000-10-24 02:34:45 (GMT)
commit	669d36f02c6bae1fff38c767ee62a3c12fde43ff (patch)
tree	7dde206803def7a7ef92ab319727b63987d0a4dc /Doc/lib
parent	f61eac425a1061654150c4687c94bc71c0f6b7a2 (diff)
download	cpython-669d36f02c6bae1fff38c767ee62a3c12fde43ff.zip cpython-669d36f02c6bae1fff38c767ee62a3c12fde43ff.tar.gz cpython-669d36f02c6bae1fff38c767ee62a3c12fde43ff.tar.bz2