summaryrefslogtreecommitdiffstats
path: root/Doc/library
diff options
context:
space:
mode:
authorChristian Heimes <christian@cheimes.de>2013-03-26 16:47:23 (GMT)
committerChristian Heimes <christian@cheimes.de>2013-03-26 16:47:23 (GMT)
commit768f6a53601a6c4e0b914aaedb977dd2ca97532a (patch)
tree0a15e62fa957038dd0e6ad2cd704d3378ac336a5 /Doc/library
parentc40f97f8beaacfb834d3f4f22d581e37dd82c14d (diff)
parent7380a67267d9ec59b70617ea59ff31819f530942 (diff)
downloadcpython-768f6a53601a6c4e0b914aaedb977dd2ca97532a.zip
cpython-768f6a53601a6c4e0b914aaedb977dd2ca97532a.tar.gz
cpython-768f6a53601a6c4e0b914aaedb977dd2ca97532a.tar.bz2
Issue 17538: Document XML vulnerabilties
Diffstat (limited to 'Doc/library')
-rw-r--r--Doc/library/pyexpat.rst7
-rw-r--r--Doc/library/xml.dom.minidom.rst8
-rw-r--r--Doc/library/xml.dom.pulldom.rst8
-rw-r--r--Doc/library/xml.etree.elementtree.rst7
-rw-r--r--Doc/library/xml.rst104
-rw-r--r--Doc/library/xml.sax.rst8
-rw-r--r--Doc/library/xmlrpc.client.rst7
-rw-r--r--Doc/library/xmlrpc.server.rst7
8 files changed, 156 insertions, 0 deletions
diff --git a/Doc/library/pyexpat.rst b/Doc/library/pyexpat.rst
index 861546c..420e407 100644
--- a/Doc/library/pyexpat.rst
+++ b/Doc/library/pyexpat.rst
@@ -14,6 +14,13 @@
references to these attributes should be marked using the :member: role.
+.. warning::
+
+ The :mod:`pyexpat` module is not secure against maliciously
+ constructed data. If you need to parse untrusted or unauthenticated data see
+ :ref:`xml-vulnerabilities`.
+
+
.. index:: single: Expat
The :mod:`xml.parsers.expat` module is a Python interface to the Expat
diff --git a/Doc/library/xml.dom.minidom.rst b/Doc/library/xml.dom.minidom.rst
index a75325f..e90c177 100644
--- a/Doc/library/xml.dom.minidom.rst
+++ b/Doc/library/xml.dom.minidom.rst
@@ -17,6 +17,14 @@ to be simpler than the full DOM and also significantly smaller. Users who are
not already proficient with the DOM should consider using the
:mod:`xml.etree.ElementTree` module for their XML processing instead
+
+.. warning::
+
+ The :mod:`xml.dom.minidom` module is not secure against
+ maliciously constructed data. If you need to parse untrusted or
+ unauthenticated data see :ref:`xml-vulnerabilities`.
+
+
DOM applications typically start by parsing some XML into a DOM. With
:mod:`xml.dom.minidom`, this is done through the parse functions::
diff --git a/Doc/library/xml.dom.pulldom.rst b/Doc/library/xml.dom.pulldom.rst
index eb16a09..8aa9cfb 100644
--- a/Doc/library/xml.dom.pulldom.rst
+++ b/Doc/library/xml.dom.pulldom.rst
@@ -17,6 +17,14 @@ processing model together with callbacks, the user of a pull parser is
responsible for explicitly pulling events from the stream, looping over those
events until either processing is finished or an error condition occurs.
+
+.. warning::
+
+ The :mod:`xml.dom.pulldom` module is not secure against
+ maliciously constructed data. If you need to parse untrusted or
+ unauthenticated data see :ref:`xml-vulnerabilities`.
+
+
Example::
from xml.dom import pulldom
diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst
index 144e344..e429f04 100644
--- a/Doc/library/xml.etree.elementtree.rst
+++ b/Doc/library/xml.etree.elementtree.rst
@@ -12,6 +12,13 @@ for parsing and creating XML data.
This module will use a fast implementation whenever available.
The :mod:`xml.etree.cElementTree` module is deprecated.
+
+.. warning::
+
+ The :mod:`xml.etree.ElementTree` module is not secure against
+ maliciously constructed data. If you need to parse untrusted or
+ unauthenticated data see :ref:`xml-vulnerabilities`.
+
Tutorial
--------
diff --git a/Doc/library/xml.rst b/Doc/library/xml.rst
index 21b2e23..b86d51a 100644
--- a/Doc/library/xml.rst
+++ b/Doc/library/xml.rst
@@ -3,8 +3,21 @@
XML Processing Modules
======================
+.. module:: xml
+ :synopsis: Package containing XML processing modules
+.. sectionauthor:: Christian Heimes <christian@python.org>
+.. sectionauthor:: Georg Brandl <georg@python.org>
+
+
Python's interfaces for processing XML are grouped in the ``xml`` package.
+.. warning::
+
+ The XML modules are not secure against erroneous or maliciously
+ constructed data. If you need to parse untrusted or unauthenticated data see
+ :ref:`xml-vulnerabilities`.
+
+
It is important to note that modules in the :mod:`xml` package require that
there be at least one SAX-compliant XML parser available. The Expat parser is
included with Python, so the :mod:`xml.parsers.expat` module will always be
@@ -27,3 +40,94 @@ The XML handling submodules are:
* :mod:`xml.sax`: SAX2 base classes and convenience functions
* :mod:`xml.parsers.expat`: the Expat parser binding
+
+
+.. _xml-vulnerabilities:
+
+XML vulnerabilities
+===================
+
+The XML processing modules are not secure against maliciously constructed data.
+An attacker can abuse vulnerabilities for e.g. denial of service attacks, to
+access local files, to generate network connections to other machines, or
+to or circumvent firewalls. The attacks on XML abuse unfamiliar features
+like inline `DTD`_ (document type definition) with entities.
+
+
+========================= ======== ========= ========= ======== =========
+kind sax etree minidom pulldom xmlrpc
+========================= ======== ========= ========= ======== =========
+billion laughs **True** **True** **True** **True** **True**
+quadratic blowup **True** **True** **True** **True** **True**
+external entity expansion **True** False (1) False (2) **True** False (3)
+DTD retrieval **True** False False **True** False
+decompression bomb False False False False **True**
+========================= ======== ========= ========= ======== =========
+
+1. :mod:`xml.etree.ElementTree` doesn't expand external entities and raises a
+ ParserError when an entity occurs.
+2. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns
+ the unexpanded entity verbatim.
+3. :mod:`xmlrpclib` doesn't expand external entities and omits them.
+
+
+billion laughs / exponential entity expansion
+ The `Billion Laughs`_ attack -- also known as exponential entity expansion --
+ uses multiple levels of nested entities. Each entity refers to another entity
+ several times, the final entity definition contains a small string. Eventually
+ the small string is expanded to several gigabytes. The exponential expansion
+ consumes lots of CPU time, too.
+
+quadratic blowup entity expansion
+ A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses
+ entity expansion, too. Instead of nested entities it repeats one large entity
+ with a couple of thousand chars over and over again. The attack isn't as
+ efficient as the exponential case but it avoids triggering countermeasures of
+ parsers against heavily nested entities.
+
+external entity expansion
+ Entity declarations can contain more than just text for replacement. They can
+ also point to external resources by public identifiers or system identifiers.
+ System identifiers are standard URIs or can refer to local files. The XML
+ parser retrieves the resource with e.g. HTTP or FTP requests and embeds the
+ content into the XML document.
+
+DTD retrieval
+ Some XML libraries like Python's mod:'xml.dom.pulldom' retrieve document type
+ definitions from remote or local locations. The feature has similar
+ implications as the external entity expansion issue.
+
+decompression bomb
+ The issue of decompression bombs (aka `ZIP bomb`_) apply to all XML libraries
+ that can parse compressed XML stream like gzipped HTTP streams or LZMA-ed
+ files. For an attacker it can reduce the amount of transmitted data by three
+ magnitudes or more.
+
+The documentation of `defusedxml`_ on PyPI has further information about
+all known attack vectors with examples and references.
+
+defused packages
+----------------
+
+`defusedxml`_ is a pure Python package with modified subclasses of all stdlib
+XML parsers that prevent any potentially malicious operation. The courses of
+action are recommended for any server code that parses untrusted XML data. The
+package also ships with example exploits and an extended documentation on more
+XML exploits like xpath injection.
+
+`defusedexpat`_ provides a modified libexpat and patched replacment
+:mod:`pyexpat` extension module with countermeasures against entity expansion
+DoS attacks. Defusedexpat still allows a sane and configurable amount of entity
+expansions. The modifications will be merged into future releases of Python.
+
+The workarounds and modifications are not included in patch releases as they
+break backward compatibility. After all inline DTD and entity expansion are
+well-definied XML features.
+
+
+.. _defusedxml: <https://pypi.python.org/pypi/defusedxml/>
+.. _defusedexpat: <https://pypi.python.org/pypi/defusedexpat/>
+.. _Billion Laughs: http://en.wikipedia.org/wiki/Billion_laughs
+.. _ZIP bomb: http://en.wikipedia.org/wiki/Zip_bomb
+.. _DTD: http://en.wikipedia.org/wiki/Document_Type_Definition
+
diff --git a/Doc/library/xml.sax.rst b/Doc/library/xml.sax.rst
index 1bf55b4..d5c56b6 100644
--- a/Doc/library/xml.sax.rst
+++ b/Doc/library/xml.sax.rst
@@ -13,6 +13,14 @@ Simple API for XML (SAX) interface for Python. The package itself provides the
SAX exceptions and the convenience functions which will be most used by users of
the SAX API.
+
+.. warning::
+
+ The :mod:`xml.sax` module is not secure against maliciously
+ constructed data. If you need to parse untrusted or unauthenticated data see
+ :ref:`xml-vulnerabilities`.
+
+
The convenience functions are:
diff --git a/Doc/library/xmlrpc.client.rst b/Doc/library/xmlrpc.client.rst
index 1871c99..3a53655 100644
--- a/Doc/library/xmlrpc.client.rst
+++ b/Doc/library/xmlrpc.client.rst
@@ -21,6 +21,13 @@ supports writing XML-RPC client code; it handles all the details of translating
between conformable Python objects and XML on the wire.
+.. warning::
+
+ The :mod:`xmlrpc.client` module is not secure against maliciously
+ constructed data. If you need to parse untrusted or unauthenticated data see
+ :ref:`xml-vulnerabilities`.
+
+
.. class:: ServerProxy(uri, transport=None, encoding=None, verbose=False, \
allow_none=False, use_datetime=False, \
use_builtin_types=False)
diff --git a/Doc/library/xmlrpc.server.rst b/Doc/library/xmlrpc.server.rst
index 6493fd4..18fee2f 100644
--- a/Doc/library/xmlrpc.server.rst
+++ b/Doc/library/xmlrpc.server.rst
@@ -16,6 +16,13 @@ servers written in Python. Servers can either be free standing, using
:class:`CGIXMLRPCRequestHandler`.
+.. warning::
+
+ The :mod:`xmlrpc.client` module is not secure against maliciously
+ constructed data. If you need to parse untrusted or unauthenticated data see
+ :ref:`xml-vulnerabilities`.
+
+
.. class:: SimpleXMLRPCServer(addr, requestHandler=SimpleXMLRPCRequestHandler,\
logRequests=True, allow_none=False, encoding=None,\
bind_and_activate=True, use_builtin_types=False)