diff options
author | Fred Drake <fdrake@acm.org> | 2000-10-12 20:05:09 (GMT) |
---|---|---|
committer | Fred Drake <fdrake@acm.org> | 2000-10-12 20:05:09 (GMT) |
commit | 014f0e33946c212e60604b4cf819cfec9fabb041 (patch) | |
tree | 515bc8be5ed1ecfa28d94e0d6036601eab1ecaca /Doc/lib/xmlsaxhandler.tex | |
parent | 110941a4ba037a3cab0224370aed71c465a2f865 (diff) | |
download | cpython-014f0e33946c212e60604b4cf819cfec9fabb041.zip cpython-014f0e33946c212e60604b4cf819cfec9fabb041.tar.gz cpython-014f0e33946c212e60604b4cf819cfec9fabb041.tar.bz2 |
Massive addition of SAX documentation from Martin von Loewis
<loewis@informatik.hu-berlin.de>.
Reorganized to be more like other parts of the documentation in its
arrangement, but with few content changes.
This closes SourceForge patch #101850.
Diffstat (limited to 'Doc/lib/xmlsaxhandler.tex')
-rw-r--r-- | Doc/lib/xmlsaxhandler.tex | 321 |
1 files changed, 321 insertions, 0 deletions
diff --git a/Doc/lib/xmlsaxhandler.tex b/Doc/lib/xmlsaxhandler.tex new file mode 100644 index 0000000..77acc3e --- /dev/null +++ b/Doc/lib/xmlsaxhandler.tex @@ -0,0 +1,321 @@ +\section{\module{xml.sax.handler} --- + Base classes for SAX handlers} + +\declaremodule{standard}{xml.sax.handler} +\modulesynopsis{Base classes for SAX event handlers.} +\sectionauthor{Martin v. L\"owis}{loewis@informatik.hu-berlin.de} +\moduleauthor{Lars Marius Garshol}{larsga@garshol.priv.no} + +\versionadded{2.0} + + +The SAX API defines four kinds of handlers: content handlers, DTD +handlers, error handlers, and entity resolvers. Applications normally +only need to implement those interfaces whose events they are +interested in; they can implement the interfaces in a single object or +in multiple objects. Handler implementations should inherit from the +base classes provided in the module \module{xml.sax}, so that all +methods get default implementations. + +\begin{classdesc}{ContentHandler}{} + This is the main callback interface in SAX, and the one most + important to applications. The order of events in this interface + mirrors the order of the information in the document. +\end{classdesc} + +\begin{classdesc}{DTDHandler}{} + Handle DTD events. + + This interface specifies only those DTD events required for basic + parsing (unparsed entities and attributes). +\end{classdesc} + +\begin{classdesc}{EntityResolver}{} + Basic interface for resolving entities. If you create an object + implementing this interface, then register the object with your + Parser, the parser will call the method in your object to resolve all + external entities. +\end{classdesc} + +In addition to these classes, \module{xml.sax.handler} provides +symbolic constants for the feature and property names. + +\begin{datadesc}{feature_namespaces} + Value: \code{"http://xml.org/sax/features/namespaces"}\\ + true: Perform Namespace processing (default).\\ + false: Optionally do not perform Namespace processing + (implies namespace-prefixes).\\ + access: (parsing) read-only; (not parsing) read/write\\ +\end{datadesc} + +\begin{datadesc}{feature_namespace_prefixes} + Value: \code{"http://xml.org/sax/features/namespace-prefixes"}\\ + true: Report the original prefixed names and attributes used for Namespace + declarations.\\ + false: Do not report attributes used for Namespace declarations, and + optionally do not report original prefixed names (default).\\ + access: (parsing) read-only; (not parsing) read/write +\end{datadesc} + +\begin{datadesc}{feature_string_interning} + Value: \code{"http://xml.org/sax/features/string-interning"} + true: All element names, prefixes, attribute names, Namespace URIs, and + local names are interned using the built-in intern function.\\ + false: Names are not necessarily interned, although they may be (default).\\ + access: (parsing) read-only; (not parsing) read/write +\end{datadesc} + +\begin{datadesc}{feature_validation} + Value: \code{"http://xml.org/sax/features/validation"}\\ + true: Report all validation errors (implies external-general-entities and + external-parameter-entities).\\ + false: Do not report validation errors.\\ + access: (parsing) read-only; (not parsing) read/write +\end{datadesc} + +\begin{datadesc}{feature_external_ges} + Value: \code{"http://xml.org/sax/features/external-general-entities"}\\ + true: Include all external general (text) entities.\\ + false: Do not include external general entities.\\ + access: (parsing) read-only; (not parsing) read/write +\end{datadesc} + +\begin{datadesc}{feature_external_pes} + Value: \code{"http://xml.org/sax/features/external-parameter-entities"}\\ + true: Include all external parameter entities, including the external + DTD subset.\\ + false: Do not include any external parameter entities, even the external + DTD subset.\\ + access: (parsing) read-only; (not parsing) read/write +\end{datadesc} + +\begin{datadesc}{all_features} + List of all features. +\end{datadesc} + +\begin{datadesc}{property_lexical_handler} + Value: \code{"http://xml.org/sax/properties/lexical-handler"}\\ + data type: xml.sax.sax2lib.LexicalHandler (not supported in Python 2)\\ + description: An optional extension handler for lexical events like comments.\\ + access: read/write +\end{datadesc} + +\begin{datadesc}{property_declaration_handler} + Value: \code{"http://xml.org/sax/properties/declaration-handler"}\\ + data type: xml.sax.sax2lib.DeclHandler (not supported in Python 2)\\ + description: An optional extension handler for DTD-related events other + than notations and unparsed entities.\\ + access: read/write +\end{datadesc} + +\begin{datadesc}{property_dom_node} + Value: \code{"http://xml.org/sax/properties/dom-node"}\\ + data type: org.w3c.dom.Node (not supported in Python 2) \\ + description: When parsing, the current DOM node being visited if this is + a DOM iterator; when not parsing, the root DOM node for + iteration.\\ + access: (parsing) read-only; (not parsing) read/write +\end{datadesc} + +\begin{datadesc}{property_xml_string} + Value: \code{"http://xml.org/sax/properties/xml-string"}\\ + data type: String\\ + description: The literal string of characters that was the source for + the current event.\\ + access: read-only +\end{datadesc} + +\begin{datadesc}{all_properties} + List of all known property names. +\end{datadesc} + + +\subsection{ContentHandler Objects \label{content-handler-objects}} + +Users are expected to subclass \class{ContentHandler} to support their +application. The following methods are called by the parser on the +appropriate events in the input document: + +\begin{methoddesc}[ContentHandler]{setDocumentLocator}{locator} + Called by the parser to give the application a locator for locating + the origin of document events. + + SAX parsers are strongly encouraged (though not absolutely required) + to supply a locator: if it does so, it must supply the locator to + the application by invoking this method before invoking any of the + other methods in the DocumentHandler interface. + + The locator allows the application to determine the end position of + any document-related event, even if the parser is not reporting an + error. Typically, the application will use this information for + reporting its own errors (such as character content that does not + match an application's business rules). The information returned by + the locator is probably not sufficient for use with a search engine. + + Note that the locator will return correct information only during + the invocation of the events in this interface. The application + should not attempt to use it at any other time. +\end{methoddesc} + +\begin{methoddesc}[ContentHandler]{startDocument}{} + Receive notification of the beginning of a document. + + The SAX parser will invoke this method only once, before any other + methods in this interface or in DTDHandler (except for + \method{setDocumentLocator()}). +\end{methoddesc} + +\begin{methoddesc}[ContentHandler]{endDocument}{} + Receive notification of the end of a document. + + The SAX parser will invoke this method only once, and it will be the + last method invoked during the parse. The parser shall not invoke + this method until it has either abandoned parsing (because of an + unrecoverable error) or reached the end of input. +\end{methoddesc} + +\begin{methoddesc}[ContentHandler]{startPrefixMapping}{prefix, uri} + Begin the scope of a prefix-URI Namespace mapping. + + The information from this event is not necessary for normal + Namespace processing: the SAX XML reader will automatically replace + prefixes for element and attribute names when the + \code{http://xml.org/sax/features/namespaces} feature is true (the + default). + +%% XXX This is not really the default, is it? MvL + + There are cases, however, when applications need to use prefixes in + character data or in attribute values, where they cannot safely be + expanded automatically; the start/endPrefixMapping event supplies + the information to the application to expand prefixes in those + contexts itself, if necessary. + + Note that start/endPrefixMapping events are not guaranteed to be + properly nested relative to each-other: all + \method{startPrefixMapping()} events will occur before the + corresponding startElement event, and all \method{endPrefixMapping()} + events will occur after the corresponding \method{endElement()} event, + but their order is not guaranteed. +\end{methoddesc} + +\begin{methoddesc}[ContentHandler]{endPrefixMapping}{prefix} + End the scope of a prefix-URI mapping. + + See \method{startPrefixMapping()} for details. This event will always + occur after the corresponding endElement event, but the order of + endPrefixMapping events is not otherwise guaranteed. +\end{methoddesc} + +\begin{methoddesc}[ContentHandler]{startElement}{name, attrs} + Signals the start of an element in non-namespace mode. + + The \var{name} parameter contains the raw XML 1.0 name of the + element type as a string and the \var{attrs} parameter holds an + instance of the \class{Attributes} class containing the attributes + of the element. +\end{methoddesc} + +\begin{methoddesc}[ContentHandler]{endElement}{name} + Signals the end of an element in non-namespace mode. + + The \var{name} parameter contains the name of the element type, just + as with the startElement event. +\end{methoddesc} + +\begin{methoddesc}[ContentHandler]{startElementNS}{name, qname, attrs} + Signals the start of an element in namespace mode. + + The \var{name} parameter contains the name of the element type as a + (uri, localname) tuple, the \var{qname} parameter the raw XML 1.0 + name used in the source document, and the \var{attrs} parameter + holds an instance of the \class{AttributesNS} class containing the + attributes of the element. + + Parsers may set the \var{qname} parameter to \code{None}, unless the + \code{http://xml.org/sax/features/namespace-prefixes} feature is + activated. +\end{methoddesc} + +\begin{methoddesc}[ContentHandler]{endElementNS}{name, qname} + Signals the end of an element in namespace mode. + + The \var{name} parameter contains the name of the element type, just + as with the startElementNS event, likewise the \var{qname} parameter. +\end{methoddesc} + +\begin{methoddesc}[ContentHandler]{characters}{content} + Receive notification of character data. + + The Parser will call this method to report each chunk of character + data. SAX parsers may return all contiguous character data in a + single chunk, or they may split it into several chunks; however, all + of the characters in any single event must come from the same + external entity so that the Locator provides useful information. + + \var{content} may be a Unicode string or a byte string; the + \code{expat} reader module produces always Unicode strings. +\end{methoddesc} + +\begin{methoddesc}[ContentHandler]{ignorableWhitespace}{} + Receive notification of ignorable whitespace in element content. + + Validating Parsers must use this method to report each chunk + of ignorable whitespace (see the W3C XML 1.0 recommendation, + section 2.10): non-validating parsers may also use this method + if they are capable of parsing and using content models. + + SAX parsers may return all contiguous whitespace in a single + chunk, or they may split it into several chunks; however, all + of the characters in any single event must come from the same + external entity, so that the Locator provides useful + information. +\end{methoddesc} + +\begin{methoddesc}[ContentHandler]{processingInstruction}{target, data} + Receive notification of a processing instruction. + + The Parser will invoke this method once for each processing + instruction found: note that processing instructions may occur + before or after the main document element. + + A SAX parser should never report an XML declaration (XML 1.0, + section 2.8) or a text declaration (XML 1.0, section 4.3.1) using + this method. +\end{methoddesc} + +\begin{methoddesc}[ContentHandler]{skippedEntity}{name} + Receive notification of a skipped entity. + + The Parser will invoke this method once for each entity + skipped. Non-validating processors may skip entities if they have + not seen the declarations (because, for example, the entity was + declared in an external DTD subset). All processors may skip + external entities, depending on the values of the + \code{http://xml.org/sax/features/external-general-entities} and the + \code{http://xml.org/sax/features/external-parameter-entities} + properties. +\end{methoddesc} + + +\subsection{DTDHandler Objects \label{dtd-handler-objects}} + +\class{DTDHandler} instances provide the following methods: + +\begin{methoddesc}[DTDHandler]{notationDecl}{name, publicId, systemId} + Handle a notation declaration event. +\end{methoddesc} + +\begin{methoddesc}[DTDHandler]{unparsedEntityDecl}{name, publicId, + systemId, ndata} + Handle an unparsed entity declaration event. +\end{methoddesc} + + +\subsection{EntityResolver Objects \label{entity-resolver-objects}} + +\begin{methoddesc}[EntityResolver]{resolveEntity}{publicId, systemId} + Resolve the system identifier of an entity and return either the + system identifier to read from as a string, or an InputSource to + read from. The default implementation returns \var{systemId}. +\end{methoddesc} |