diff options
Diffstat (limited to 'Doc/library/xml.sax.reader.rst')
-rw-r--r-- | Doc/library/xml.sax.reader.rst | 386 |
1 files changed, 386 insertions, 0 deletions
diff --git a/Doc/library/xml.sax.reader.rst b/Doc/library/xml.sax.reader.rst new file mode 100644 index 0000000..d64a4fc --- /dev/null +++ b/Doc/library/xml.sax.reader.rst @@ -0,0 +1,386 @@ + +:mod:`xml.sax.xmlreader` --- Interface for XML parsers +====================================================== + +.. module:: xml.sax.xmlreader + :synopsis: Interface which SAX-compliant XML parsers must implement. +.. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no> +.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> + + +.. versionadded:: 2.0 + +SAX parsers implement the :class:`XMLReader` interface. They are implemented in +a Python module, which must provide a function :func:`create_parser`. This +function is invoked by :func:`xml.sax.make_parser` with no arguments to create +a new parser object. + + +.. class:: XMLReader() + + Base class which can be inherited by SAX parsers. + + +.. class:: IncrementalParser() + + In some cases, it is desirable not to parse an input source at once, but to feed + chunks of the document as they get available. Note that the reader will normally + not read the entire file, but read it in chunks as well; still :meth:`parse` + won't return until the entire document is processed. So these interfaces should + be used if the blocking behaviour of :meth:`parse` is not desirable. + + When the parser is instantiated it is ready to begin accepting data from the + feed method immediately. After parsing has been finished with a call to close + the reset method must be called to make the parser ready to accept new data, + either from feed or using the parse method. + + Note that these methods must *not* be called during parsing, that is, after + parse has been called and before it returns. + + By default, the class also implements the parse method of the XMLReader + interface using the feed, close and reset methods of the IncrementalParser + interface as a convenience to SAX 2.0 driver writers. + + +.. class:: Locator() + + Interface for associating a SAX event with a document location. A locator object + will return valid results only during calls to DocumentHandler methods; at any + other time, the results are unpredictable. If information is not available, + methods may return ``None``. + + +.. class:: InputSource([systemId]) + + Encapsulation of the information needed by the :class:`XMLReader` to read + entities. + + This class may include information about the public identifier, system + identifier, byte stream (possibly with character encoding information) and/or + the character stream of an entity. + + Applications will create objects of this class for use in the + :meth:`XMLReader.parse` method and for returning from + EntityResolver.resolveEntity. + + An :class:`InputSource` belongs to the application, the :class:`XMLReader` is + not allowed to modify :class:`InputSource` objects passed to it from the + application, although it may make copies and modify those. + + +.. class:: AttributesImpl(attrs) + + This is an implementation of the :class:`Attributes` interface (see section + :ref:`attributes-objects`). This is a dictionary-like object which + represents the element attributes in a :meth:`startElement` call. In addition + to the most useful dictionary operations, it supports a number of other + methods as described by the interface. Objects of this class should be + instantiated by readers; *attrs* must be a dictionary-like object containing + a mapping from attribute names to attribute values. + + +.. class:: AttributesNSImpl(attrs, qnames) + + Namespace-aware variant of :class:`AttributesImpl`, which will be passed to + :meth:`startElementNS`. It is derived from :class:`AttributesImpl`, but + understands attribute names as two-tuples of *namespaceURI* and + *localname*. In addition, it provides a number of methods expecting qualified + names as they appear in the original document. This class implements the + :class:`AttributesNS` interface (see section :ref:`attributes-ns-objects`). + + +.. _xmlreader-objects: + +XMLReader Objects +----------------- + +The :class:`XMLReader` interface supports the following methods: + + +.. method:: XMLReader.parse(source) + + Process an input source, producing SAX events. The *source* object can be a + system identifier (a string identifying the input source -- typically a file + name or an URL), a file-like object, or an :class:`InputSource` object. When + :meth:`parse` returns, the input is completely processed, and the parser object + can be discarded or reset. As a limitation, the current implementation only + accepts byte streams; processing of character streams is for further study. + + +.. method:: XMLReader.getContentHandler() + + Return the current :class:`ContentHandler`. + + +.. method:: XMLReader.setContentHandler(handler) + + Set the current :class:`ContentHandler`. If no :class:`ContentHandler` is set, + content events will be discarded. + + +.. method:: XMLReader.getDTDHandler() + + Return the current :class:`DTDHandler`. + + +.. method:: XMLReader.setDTDHandler(handler) + + Set the current :class:`DTDHandler`. If no :class:`DTDHandler` is set, DTD + events will be discarded. + + +.. method:: XMLReader.getEntityResolver() + + Return the current :class:`EntityResolver`. + + +.. method:: XMLReader.setEntityResolver(handler) + + Set the current :class:`EntityResolver`. If no :class:`EntityResolver` is set, + attempts to resolve an external entity will result in opening the system + identifier for the entity, and fail if it is not available. + + +.. method:: XMLReader.getErrorHandler() + + Return the current :class:`ErrorHandler`. + + +.. method:: XMLReader.setErrorHandler(handler) + + Set the current error handler. If no :class:`ErrorHandler` is set, errors will + be raised as exceptions, and warnings will be printed. + + +.. method:: XMLReader.setLocale(locale) + + Allow an application to set the locale for errors and warnings. + + SAX parsers are not required to provide localization for errors and warnings; if + they cannot support the requested locale, however, they must throw a SAX + exception. Applications may request a locale change in the middle of a parse. + + +.. method:: XMLReader.getFeature(featurename) + + Return the current setting for feature *featurename*. If the feature is not + recognized, :exc:`SAXNotRecognizedException` is raised. The well-known + featurenames are listed in the module :mod:`xml.sax.handler`. + + +.. method:: XMLReader.setFeature(featurename, value) + + Set the *featurename* to *value*. If the feature is not recognized, + :exc:`SAXNotRecognizedException` is raised. If the feature or its setting is not + supported by the parser, *SAXNotSupportedException* is raised. + + +.. method:: XMLReader.getProperty(propertyname) + + Return the current setting for property *propertyname*. If the property is not + recognized, a :exc:`SAXNotRecognizedException` is raised. The well-known + propertynames are listed in the module :mod:`xml.sax.handler`. + + +.. method:: XMLReader.setProperty(propertyname, value) + + Set the *propertyname* to *value*. If the property is not recognized, + :exc:`SAXNotRecognizedException` is raised. If the property or its setting is + not supported by the parser, *SAXNotSupportedException* is raised. + + +.. _incremental-parser-objects: + +IncrementalParser Objects +------------------------- + +Instances of :class:`IncrementalParser` offer the following additional methods: + + +.. method:: IncrementalParser.feed(data) + + Process a chunk of *data*. + + +.. method:: IncrementalParser.close() + + Assume the end of the document. That will check well-formedness conditions that + can be checked only at the end, invoke handlers, and may clean up resources + allocated during parsing. + + +.. method:: IncrementalParser.reset() + + This method is called after close has been called to reset the parser so that it + is ready to parse new documents. The results of calling parse or feed after + close without calling reset are undefined. + + +.. _locator-objects: + +Locator Objects +--------------- + +Instances of :class:`Locator` provide these methods: + + +.. method:: Locator.getColumnNumber() + + Return the column number where the current event ends. + + +.. method:: Locator.getLineNumber() + + Return the line number where the current event ends. + + +.. method:: Locator.getPublicId() + + Return the public identifier for the current event. + + +.. method:: Locator.getSystemId() + + Return the system identifier for the current event. + + +.. _input-source-objects: + +InputSource Objects +------------------- + + +.. method:: InputSource.setPublicId(id) + + Sets the public identifier of this :class:`InputSource`. + + +.. method:: InputSource.getPublicId() + + Returns the public identifier of this :class:`InputSource`. + + +.. method:: InputSource.setSystemId(id) + + Sets the system identifier of this :class:`InputSource`. + + +.. method:: InputSource.getSystemId() + + Returns the system identifier of this :class:`InputSource`. + + +.. method:: InputSource.setEncoding(encoding) + + Sets the character encoding of this :class:`InputSource`. + + The encoding must be a string acceptable for an XML encoding declaration (see + section 4.3.3 of the XML recommendation). + + The encoding attribute of the :class:`InputSource` is ignored if the + :class:`InputSource` also contains a character stream. + + +.. method:: InputSource.getEncoding() + + Get the character encoding of this InputSource. + + +.. method:: InputSource.setByteStream(bytefile) + + Set the byte stream (a Python file-like object which does not perform + byte-to-character conversion) for this input source. + + The SAX parser will ignore this if there is also a character stream specified, + but it will use a byte stream in preference to opening a URI connection itself. + + If the application knows the character encoding of the byte stream, it should + set it with the setEncoding method. + + +.. method:: InputSource.getByteStream() + + Get the byte stream for this input source. + + The getEncoding method will return the character encoding for this byte stream, + or None if unknown. + + +.. method:: InputSource.setCharacterStream(charfile) + + Set the character stream for this input source. (The stream must be a Python 1.6 + Unicode-wrapped file-like that performs conversion to Unicode strings.) + + If there is a character stream specified, the SAX parser will ignore any byte + stream and will not attempt to open a URI connection to the system identifier. + + +.. method:: InputSource.getCharacterStream() + + Get the character stream for this input source. + + +.. _attributes-objects: + +The :class:`Attributes` Interface +--------------------------------- + +:class:`Attributes` objects implement a portion of the mapping protocol, +including the methods :meth:`copy`, :meth:`get`, :meth:`has_key`, :meth:`items`, +:meth:`keys`, and :meth:`values`. The following methods are also provided: + + +.. method:: Attributes.getLength() + + Return the number of attributes. + + +.. method:: Attributes.getNames() + + Return the names of the attributes. + + +.. method:: Attributes.getType(name) + + Returns the type of the attribute *name*, which is normally ``'CDATA'``. + + +.. method:: Attributes.getValue(name) + + Return the value of attribute *name*. + +.. % getValueByQName, getNameByQName, getQNameByName, getQNames available +.. % here already, but documented only for derived class. + + +.. _attributes-ns-objects: + +The :class:`AttributesNS` Interface +----------------------------------- + +This interface is a subtype of the :class:`Attributes` interface (see section +:ref:`attributes-objects`). All methods supported by that interface are also +available on :class:`AttributesNS` objects. + +The following methods are also available: + + +.. method:: AttributesNS.getValueByQName(name) + + Return the value for a qualified name. + + +.. method:: AttributesNS.getNameByQName(name) + + Return the ``(namespace, localname)`` pair for a qualified *name*. + + +.. method:: AttributesNS.getQNameByName(name) + + Return the qualified name for a ``(namespace, localname)`` pair. + + +.. method:: AttributesNS.getQNames() + + Return the qualified names of all attributes. + |