summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorAndrew M. Kuchling <amk@amk.ca>2000-10-12 02:37:14 (GMT)
committerAndrew M. Kuchling <amk@amk.ca>2000-10-12 02:37:14 (GMT)
commit6032c48b475c53263dcb2c9e5955e033fdf599d8 (patch)
tree2a0e1ce02883a404d5748a6fb9a9c266baeaa959
parent0be483fd4d0b3fd9bf531e6ea56a73a5cd7f680d (diff)
downloadcpython-6032c48b475c53263dcb2c9e5955e033fdf599d8.zip
cpython-6032c48b475c53263dcb2c9e5955e033fdf599d8.tar.gz
cpython-6032c48b475c53263dcb2c9e5955e033fdf599d8.tar.bz2
Add new section on the XML package. (This was the only major new 2.0 feature
left that wasn't covered. The article is therefore now essentially complete.) A few minor changes
-rw-r--r--Doc/whatsnew/whatsnew20.tex174
1 files changed, 165 insertions, 9 deletions
diff --git a/Doc/whatsnew/whatsnew20.tex b/Doc/whatsnew/whatsnew20.tex
index 7440b92..e1cd14a 100644
--- a/Doc/whatsnew/whatsnew20.tex
+++ b/Doc/whatsnew/whatsnew20.tex
@@ -156,8 +156,8 @@ type implementation by Fredrik Lundh. A detailed explanation of the
interface is in the file \file{Misc/unicode.txt} in the Python source
distribution; it's also available on the Web at
\url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}.
-This article will simply cover the most significant points from the
-full interface.
+This article will simply cover the most significant points about the Unicode
+interfaces.
In Python source code, Unicode strings are written as
\code{u"string"}. Arbitrary Unicode characters can be written using a
@@ -615,12 +615,12 @@ b.append(b)
\end{verbatim}
The comparison \code{a==b} returns true, because the two recursive
-data structures are isomorphic. \footnote{See the thread ``trashcan
+data structures are isomorphic. See the thread ``trashcan
and PR\#7'' in the April 2000 archives of the python-dev mailing list
for the discussion leading up to this implementation, and some useful
relevant links.
-%http://www.python.org/pipermail/python-dev/2000-April/004834.html
-}
+% Starting URL:
+% http://www.python.org/pipermail/python-dev/2000-April/004834.html
Work has been done on porting Python to 64-bit Windows on the Itanium
processor, mostly by Trent Mick of ActiveState. (Confusingly,
@@ -950,7 +950,6 @@ expat_extension = Extension('xml.parsers.pyexpat',
)
setup (name = "PyXML", version = "0.5.4",
ext_modules =[ expat_extension ] )
-
\end{verbatim}
The Distutils can also take care of creating source and binary
@@ -966,10 +965,165 @@ development.
All this is documented in a new manual, \textit{Distributing Python
Modules}, that joins the basic set of Python documentation.
-% ======================================================================
-%\section{New XML Code}
+======================================================================
+\section{XML Modules}
+
+Python 1.5.2 included a simple XML parser in the form of the
+\module{xmllib} module, contributed by Sjoerd Mullender. Since
+1.5.2's release, two different interfaces for processing XML have
+become common: SAX2 (version 2 of the Simple API for XML) provides an
+event-driven interface with some similarities to \module{xmllib}, and
+the DOM (Document Object Model) provides a tree-based interface,
+transforming an XML document into a tree of nodes that can be
+traversed and modified. Python 2.0 includes a SAX2 interface and a
+stripped-down DOM interface as part of the \module{xml} package.
+Here we will give a brief overview of these new interfaces; consult
+the Python documentation or the source code for complete details.
+The Python XML SIG is also working on improved documentation.
+
+\subsection{SAX2 Support}
+
+SAX defines an event-driven interface for parsing XML. To use SAX,
+you must write a SAX handler class. Handler classes inherit from
+various classes provided by SAX, and override various methods that
+will then be called by the XML parser. For example, the
+\method{startElement} and \method{endElement} methods are called for
+every starting and end tag encountered by the parser, the
+\method{characters()} method is called for every chunk of character
+data, and so forth.
+
+The advantage of the event-driven approach is that that the whole
+document doesn't have to be resident in memory at any one time, which
+matters if you are processing really huge documents. However, writing
+the SAX handler class can get very complicated if you're trying to
+modify the document structure in some elaborate way.
+
+For example, this little example program defines a handler that prints
+a message for every starting and ending tag, and then parses the file
+\file{hamlet.xml} using it:
+
+\begin{verbatim}
+from xml import sax
+
+class SimpleHandler(sax.ContentHandler):
+ def startElement(self, name, attrs):
+ print 'Start of element:', name, attrs.keys()
+
+ def endElement(self, name):
+ print 'End of element:', name
+
+# Create a parser object
+parser = sax.make_parser()
+
+# Tell it what handler to use
+handler = SimpleHandler()
+parser.setContentHandler( handler )
+
+# Parse a file!
+parser.parse( 'hamlet.xml' )
+\end{verbatim}
+
+For more information, consult the Python documentation, or the XML
+HOWTO at \url{http://www.python.org/doc/howto/xml/}.
+
+\subsection{DOM Support}
+
+The Document Object Model is a tree-based representation for an XML
+document. A top-level \class{Document} instance is the root of the
+tree, and has a single child which is the top-level \class{Element}
+instance. This \class{Element} has children nodes representing
+character data and any sub-elements, which may have further children
+of their own, and so forth. Using the DOM you can traverse the
+resulting tree any way you like, access element and attribute values,
+insert and delete nodes, and convert the tree back into XML.
+
+The DOM is useful for modifying XML documents, because you can create
+a DOM tree, modify it by adding new nodes or rearranging subtrees, and
+then produce a new XML document as output. You can also construct a
+DOM tree manually and convert it to XML, which can be a more flexible
+way of producing XML output than simply writing
+\code{<tag1>}...\code{</tag1>} to a file.
+
+The DOM implementation included with Python lives in the
+\module{xml.dom.minidom} module. It's a lightweight implementation of
+the Level 1 DOM with support for XML namespaces. The
+\function{parse()} and \function{parseString()} convenience
+functions are provided for generating a DOM tree:
+
+\begin{verbatim}
+from xml.dom import minidom
+doc = minidom.parse('hamlet.xml')
+\end{verbatim}
+
+\code{doc} is a \class{Document} instance. \class{Document}, like all
+the other DOM classes such as \class{Element} and \class{Text}, is a
+subclass of the \class{Node} base class. All the nodes in a DOM tree
+therefore support certain common methods, such as \method{toxml()}
+which returns a string containing the XML representation of the node
+and its children. Each class also has special methods of its own; for
+example, \class{Element} and \class{Document} instances have a method
+to find all child elements with a given tag name. Continuing from the
+previous 2-line example:
+
+\begin{verbatim}
+perslist = doc.getElementsByTagName( 'PERSONA' )
+print perslist[0].toxml()
+print perslist[1].toxml()
+\end{verbatim}
-%XXX write this section...
+For the \textit{Hamlet} XML file, the above few lines output:
+
+\begin{verbatim}
+<PERSONA>CLAUDIUS, king of Denmark. </PERSONA>
+<PERSONA>HAMLET, son to the late, and nephew to the present king.</PERSONA>
+\end{verbatim}
+
+The root element of the document is available as
+\code{doc.documentElement}, and its children can be easily modified
+by deleting, adding, or removing nodes:
+
+\begin{verbatim}
+root = doc.documentElement
+
+# Remove the first child
+root.removeChild( root.childNodes[0] )
+
+# Move the new first child to the end
+root.appendChild( root.childNodes[0] )
+
+# Insert the new first child (originally,
+# the third child) before the 20th child.
+root.insertBefore( root.childNodes[0], root.childNodes[20] )
+\end{verbatim}
+
+Again, I will refer you to the Python documentation for a complete
+listing of the different \class{Node} classes and their various methods.
+
+\subsection{Relationship to PyXML}
+
+The XML Special Interest Group has been working on XML-related Python
+code for a while. Its code distribution, called PyXML, is available
+from the SIG's Web pages at \url{http://www.python.org/sigs/xml-sig/}.
+The PyXML distribution also used the package name \samp{xml}. If
+you've written programs that used PyXML, you're probably wondering
+about its compatibility with the 2.0 \module{xml} package.
+
+The answer is that Python 2.0's \module{xml} package isn't compatible
+with PyXML, but can be made compatible by installing a recent version
+PyXML. Many applications can get by with the XML support that is
+included with Python 2.0, but more complicated applications will
+require that the full PyXML package will be installed. When
+installed, PyXML versions 0.6.0 or greater will replace the
+\module{xml} package shipped with Python, and will be a strict
+superset of the standard package, adding a bunch of additional
+features. Some of the additional features in PyXML include:
+
+\begin{itemize}
+\item 4DOM, a full DOM implementation
+from FourThought LLC.
+\item The xmlproc validating parser, written by Lars Marius Garshol.
+\item The \module{sgmlop} parser accelerator module, written by Fredrik Lundh.
+\end{itemize}
% ======================================================================
\section{Module changes}
@@ -982,6 +1136,8 @@ standard library; some of the affected modules include
and \module{nntplib}. Consult the CVS logs for the exact
patch-by-patch details.
+% XXX gettext support
+
Brian Gallew contributed OpenSSL support for the \module{socket}
module. OpenSSL is an implementation of the Secure Socket Layer,
which encrypts the data being sent over a socket. When compiling