summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorGregory P. Smith <greg@krypto.org>2022-09-05 20:26:09 (GMT)
committerGitHub <noreply@github.com>2022-09-05 20:26:09 (GMT)
commitb5e331fdb38684808ffc540d53e8595bdc408b89 (patch)
treefff15beb4402c977a0a4dc51aaeab8976039650b /Doc
parent4f100fe9f1c691145e3fa959ef324646e303cdf3 (diff)
downloadcpython-b5e331fdb38684808ffc540d53e8595bdc408b89.zip
cpython-b5e331fdb38684808ffc540d53e8595bdc408b89.tar.gz
cpython-b5e331fdb38684808ffc540d53e8595bdc408b89.tar.bz2
[3.8] gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96503)
* Correctly pre-check for int-to-str conversion Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =) The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact. The justification for the current check. The C code check is: ```c max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10 ``` In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is: $$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$ From this it follows that $$\frac{M}{3L} < \frac{s-1}{10}$$ hence that $$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$ So $$2^{L(s-1)} > 10^M.$$ But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check. <!-- gh-issue-number: gh-95778 --> * Issue: gh-95778 <!-- /gh-issue-number --> Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org> Co-authored-by: Christian Heimes <christian@python.org> Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
Diffstat (limited to 'Doc')
-rw-r--r--Doc/data/python3.8.abi5
-rw-r--r--Doc/library/functions.rst8
-rw-r--r--Doc/library/json.rst11
-rw-r--r--Doc/library/stdtypes.rst159
-rw-r--r--Doc/library/sys.rst59
-rw-r--r--Doc/library/test.rst10
-rw-r--r--Doc/using/cmdline.rst13
-rw-r--r--Doc/whatsnew/3.8.rst14
8 files changed, 265 insertions, 14 deletions
diff --git a/Doc/data/python3.8.abi b/Doc/data/python3.8.abi
index 8a11301..90b2b8b 100644
--- a/Doc/data/python3.8.abi
+++ b/Doc/data/python3.8.abi
@@ -2381,7 +2381,7 @@
</data-member>
</class-decl>
<pointer-type-def type-id='type-id-55' size-in-bits='64' id='type-id-56'/>
- <class-decl name='_is' size-in-bits='21696' is-struct='yes' visibility='default' filepath='./Include/internal/pycore_pystate.h' line='67' column='1' id='type-id-66'>
+ <class-decl name='_is' size-in-bits='21760' is-struct='yes' visibility='default' filepath='./Include/internal/pycore_pystate.h' line='67' column='1' id='type-id-66'>
<data-member access='public' layout-offset-in-bits='0'>
<var-decl name='next' type-id='type-id-67' visibility='default' filepath='./Include/internal/pycore_pystate.h' line='69' column='1'/>
</data-member>
@@ -2490,6 +2490,9 @@
<data-member access='public' layout-offset-in-bits='21632'>
<var-decl name='audit_hooks' type-id='type-id-60' visibility='default' filepath='./Include/internal/pycore_pystate.h' line='137' column='1'/>
</data-member>
+ <data-member access='public' layout-offset-in-bits='21696'>
+ <var-decl name='int_max_str_digits' type-id='type-id-7' visibility='default' filepath='./Include/internal/pycore_pystate.h' line='139' column='1'/>
+ </data-member>
</class-decl>
<pointer-type-def type-id='type-id-66' size-in-bits='64' id='type-id-67'/>
<typedef-decl name='__int64_t' type-id='type-id-36' filepath='/usr/include/x86_64-linux-gnu/bits/types.h' line='44' column='1' id='type-id-77'/>
diff --git a/Doc/library/functions.rst b/Doc/library/functions.rst
index 036dca5..bc0285e 100644
--- a/Doc/library/functions.rst
+++ b/Doc/library/functions.rst
@@ -838,6 +838,14 @@ are always available. They are listed here in alphabetical order.
.. versionchanged:: 3.8
Falls back to :meth:`__index__` if :meth:`__int__` is not defined.
+ .. versionchanged:: 3.8.14
+ :class:`int` string inputs and string representations can be limited to
+ help avoid denial of service attacks. A :exc:`ValueError` is raised when
+ the limit is exceeded while converting a string *x* to an :class:`int` or
+ when converting an :class:`int` into a string would exceed the limit.
+ See the :ref:`integer string conversion length limitation
+ <int_max_str_digits>` documentation.
+
.. function:: isinstance(object, classinfo)
diff --git a/Doc/library/json.rst b/Doc/library/json.rst
index 23e39e9..c1648c7 100644
--- a/Doc/library/json.rst
+++ b/Doc/library/json.rst
@@ -18,6 +18,11 @@ is a lightweight data interchange format inspired by
`JavaScript <https://en.wikipedia.org/wiki/JavaScript>`_ object literal syntax
(although it is not a strict subset of JavaScript [#rfc-errata]_ ).
+.. warning::
+ Be cautious when parsing JSON data from untrusted sources. A malicious
+ JSON string may cause the decoder to consume considerable CPU and memory
+ resources. Limiting the size of data to be parsed is recommended.
+
:mod:`json` exposes an API familiar to users of the standard library
:mod:`marshal` and :mod:`pickle` modules.
@@ -255,6 +260,12 @@ Basic Usage
be used to use another datatype or parser for JSON integers
(e.g. :class:`float`).
+ .. versionchanged:: 3.8.14
+ The default *parse_int* of :func:`int` now limits the maximum length of
+ the integer string via the interpreter's :ref:`integer string
+ conversion length limitation <int_max_str_digits>` to help avoid denial
+ of service attacks.
+
*parse_constant*, if specified, will be called with one of the following
strings: ``'-Infinity'``, ``'Infinity'``, ``'NaN'``.
This can be used to raise an exception if invalid JSON numbers
diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst
index 28b9d5d..14d48e6 100644
--- a/Doc/library/stdtypes.rst
+++ b/Doc/library/stdtypes.rst
@@ -4870,6 +4870,165 @@ types, where they are relevant. Some of these are not reported by the
[<class 'bool'>]
+.. _int_max_str_digits:
+
+Integer string conversion length limitation
+===========================================
+
+CPython has a global limit for converting between :class:`int` and :class:`str`
+to mitigate denial of service attacks. This limit *only* applies to decimal or
+other non-power-of-two number bases. Hexadecimal, octal, and binary conversions
+are unlimited. The limit can be configured.
+
+The :class:`int` type in CPython is an abitrary length number stored in binary
+form (commonly known as a "bignum"). There exists no algorithm that can convert
+a string to a binary integer or a binary integer to a string in linear time,
+*unless* the base is a power of 2. Even the best known algorithms for base 10
+have sub-quadratic complexity. Converting a large value such as ``int('1' *
+500_000)`` can take over a second on a fast CPU.
+
+Limiting conversion size offers a practical way to avoid `CVE-2020-10735
+<https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-10735>`_.
+
+The limit is applied to the number of digit characters in the input or output
+string when a non-linear conversion algorithm would be involved. Underscores
+and the sign are not counted towards the limit.
+
+When an operation would exceed the limit, a :exc:`ValueError` is raised:
+
+.. doctest::
+
+ >>> import sys
+ >>> sys.set_int_max_str_digits(4300) # Illustrative, this is the default.
+ >>> _ = int('2' * 5432)
+ Traceback (most recent call last):
+ ...
+ ValueError: Exceeds the limit (4300) for integer string conversion: value has 5432 digits.
+ >>> i = int('2' * 4300)
+ >>> len(str(i))
+ 4300
+ >>> i_squared = i*i
+ >>> len(str(i_squared))
+ Traceback (most recent call last):
+ ...
+ ValueError: Exceeds the limit (4300) for integer string conversion: value has 8599 digits.
+ >>> len(hex(i_squared))
+ 7144
+ >>> assert int(hex(i_squared), base=16) == i*i # Hexadecimal is unlimited.
+
+The default limit is 4300 digits as provided in
+:data:`sys.int_info.default_max_str_digits <sys.int_info>`.
+The lowest limit that can be configured is 640 digits as provided in
+:data:`sys.int_info.str_digits_check_threshold <sys.int_info>`.
+
+Verification:
+
+.. doctest::
+
+ >>> import sys
+ >>> assert sys.int_info.default_max_str_digits == 4300, sys.int_info
+ >>> assert sys.int_info.str_digits_check_threshold == 640, sys.int_info
+ >>> msg = int('578966293710682886880994035146873798396722250538762761564'
+ ... '9252925514383915483333812743580549779436104706260696366600'
+ ... '571186405732').to_bytes(53, 'big')
+ ...
+
+.. versionadded:: 3.8.14
+
+Affected APIs
+-------------
+
+The limitation only applies to potentially slow conversions between :class:`int`
+and :class:`str` or :class:`bytes`:
+
+* ``int(string)`` with default base 10.
+* ``int(string, base)`` for all bases that are not a power of 2.
+* ``str(integer)``.
+* ``repr(integer)``
+* any other string conversion to base 10, for example ``f"{integer}"``,
+ ``"{}".format(integer)``, or ``b"%d" % integer``.
+
+The limitations do not apply to functions with a linear algorithm:
+
+* ``int(string, base)`` with base 2, 4, 8, 16, or 32.
+* :func:`int.from_bytes` and :func:`int.to_bytes`.
+* :func:`hex`, :func:`oct`, :func:`bin`.
+* :ref:`formatspec` for hex, octal, and binary numbers.
+* :class:`str` to :class:`float`.
+* :class:`str` to :class:`decimal.Decimal`.
+
+Configuring the limit
+---------------------
+
+Before Python starts up you can use an environment variable or an interpreter
+command line flag to configure the limit:
+
+* :envvar:`PYTHONINTMAXSTRDIGITS`, e.g.
+ ``PYTHONINTMAXSTRDIGITS=640 python3`` to set the limit to 640 or
+ ``PYTHONINTMAXSTRDIGITS=0 python3`` to disable the limitation.
+* :option:`-X int_max_str_digits <-X>`, e.g.
+ ``python3 -X int_max_str_digits=640``
+* :data:`sys.flags.int_max_str_digits` contains the value of
+ :envvar:`PYTHONINTMAXSTRDIGITS` or :option:`-X int_max_str_digits <-X>`.
+ If both the env var and the ``-X`` option are set, the ``-X`` option takes
+ precedence. A value of *-1* indicates that both were unset, thus a value of
+ :data:`sys.int_info.default_max_str_digits` was used during initilization.
+
+From code, you can inspect the current limit and set a new one using these
+:mod:`sys` APIs:
+
+* :func:`sys.get_int_max_str_digits` and :func:`sys.set_int_max_str_digits` are
+ a getter and setter for the interpreter-wide limit. Subinterpreters have
+ their own limit.
+
+Information about the default and minimum can be found in :attr:`sys.int_info`:
+
+* :data:`sys.int_info.default_max_str_digits <sys.int_info>` is the compiled-in
+ default limit.
+* :data:`sys.int_info.str_digits_check_threshold <sys.int_info>` is the lowest
+ accepted value for the limit (other than 0 which disables it).
+
+.. versionadded:: 3.8.14
+
+.. caution::
+
+ Setting a low limit *can* lead to problems. While rare, code exists that
+ contains integer constants in decimal in their source that exceed the
+ minimum threshold. A consequence of setting the limit is that Python source
+ code containing decimal integer literals longer than the limit will
+ encounter an error during parsing, usually at startup time or import time or
+ even at installation time - anytime an up to date ``.pyc`` does not already
+ exist for the code. A workaround for source that contains such large
+ constants is to convert them to ``0x`` hexadecimal form as it has no limit.
+
+ Test your application thoroughly if you use a low limit. Ensure your tests
+ run with the limit set early via the environment or flag so that it applies
+ during startup and even during any installation step that may invoke Python
+ to precompile ``.py`` sources to ``.pyc`` files.
+
+Recommended configuration
+-------------------------
+
+The default :data:`sys.int_info.default_max_str_digits` is expected to be
+reasonable for most applications. If your application requires a different
+limit, set it from your main entry point using Python version agnostic code as
+these APIs were added in security patch releases in versions before 3.11.
+
+Example::
+
+ >>> import sys
+ >>> if hasattr(sys, "set_int_max_str_digits"):
+ ... upper_bound = 68000
+ ... lower_bound = 4004
+ ... current_limit = sys.get_int_max_str_digits()
+ ... if current_limit == 0 or current_limit > upper_bound:
+ ... sys.set_int_max_str_digits(upper_bound)
+ ... elif current_limit < lower_bound:
+ ... sys.set_int_max_str_digits(lower_bound)
+
+If you need to disable it entirely, set it to ``0``.
+
+
.. rubric:: Footnotes
.. [1] Additional information on these special methods may be found in the Python
diff --git a/Doc/library/sys.rst b/Doc/library/sys.rst
index 7e11dc0..25f79d4 100644
--- a/Doc/library/sys.rst
+++ b/Doc/library/sys.rst
@@ -445,9 +445,9 @@ always available.
The :term:`named tuple` *flags* exposes the status of command line
flags. The attributes are read only.
- ============================= =============================
+ ============================= ==============================================================================================================
attribute flag
- ============================= =============================
+ ============================= ==============================================================================================================
:const:`debug` :option:`-d`
:const:`inspect` :option:`-i`
:const:`interactive` :option:`-i`
@@ -463,7 +463,8 @@ always available.
:const:`hash_randomization` :option:`-R`
:const:`dev_mode` :option:`-X` ``dev``
:const:`utf8_mode` :option:`-X` ``utf8``
- ============================= =============================
+ :const:`int_max_str_digits` :option:`-X int_max_str_digits <-X>` (:ref:`integer string conversion length limitation <int_max_str_digits>`)
+ ============================= ==============================================================================================================
.. versionchanged:: 3.2
Added ``quiet`` attribute for the new :option:`-q` flag.
@@ -481,6 +482,9 @@ always available.
Added ``dev_mode`` attribute for the new :option:`-X` ``dev`` flag
and ``utf8_mode`` attribute for the new :option:`-X` ``utf8`` flag.
+ .. versionchanged:: 3.8.14
+ Added the ``int_max_str_digits`` attribute.
+
.. data:: float_info
@@ -661,6 +665,15 @@ always available.
.. versionadded:: 3.6
+
+.. function:: get_int_max_str_digits()
+
+ Returns the current value for the :ref:`integer string conversion length
+ limitation <int_max_str_digits>`. See also :func:`set_int_max_str_digits`.
+
+ .. versionadded:: 3.8.14
+
+
.. function:: getrefcount(object)
Return the reference count of the *object*. The count returned is generally one
@@ -934,19 +947,31 @@ always available.
.. tabularcolumns:: |l|L|
- +-------------------------+----------------------------------------------+
- | Attribute | Explanation |
- +=========================+==============================================+
- | :const:`bits_per_digit` | number of bits held in each digit. Python |
- | | integers are stored internally in base |
- | | ``2**int_info.bits_per_digit`` |
- +-------------------------+----------------------------------------------+
- | :const:`sizeof_digit` | size in bytes of the C type used to |
- | | represent a digit |
- +-------------------------+----------------------------------------------+
+ +----------------------------------------+-----------------------------------------------+
+ | Attribute | Explanation |
+ +========================================+===============================================+
+ | :const:`bits_per_digit` | number of bits held in each digit. Python |
+ | | integers are stored internally in base |
+ | | ``2**int_info.bits_per_digit`` |
+ +----------------------------------------+-----------------------------------------------+
+ | :const:`sizeof_digit` | size in bytes of the C type used to |
+ | | represent a digit |
+ +----------------------------------------+-----------------------------------------------+
+ | :const:`default_max_str_digits` | default value for |
+ | | :func:`sys.get_int_max_str_digits` when it |
+ | | is not otherwise explicitly configured. |
+ +----------------------------------------+-----------------------------------------------+
+ | :const:`str_digits_check_threshold` | minimum non-zero value for |
+ | | :func:`sys.set_int_max_str_digits`, |
+ | | :envvar:`PYTHONINTMAXSTRDIGITS`, or |
+ | | :option:`-X int_max_str_digits <-X>`. |
+ +----------------------------------------+-----------------------------------------------+
.. versionadded:: 3.1
+ .. versionchanged:: 3.8.14
+ Added ``default_max_str_digits`` and ``str_digits_check_threshold``.
+
.. data:: __interactivehook__
@@ -1220,6 +1245,14 @@ always available.
.. availability:: Unix.
+.. function:: set_int_max_str_digits(n)
+
+ Set the :ref:`integer string conversion length limitation
+ <int_max_str_digits>` used by this interpreter. See also
+ :func:`get_int_max_str_digits`.
+
+ .. versionadded:: 3.8.14
+
.. function:: setprofile(profilefunc)
.. index::
diff --git a/Doc/library/test.rst b/Doc/library/test.rst
index 6c99f39..aa825b3 100644
--- a/Doc/library/test.rst
+++ b/Doc/library/test.rst
@@ -1283,6 +1283,16 @@ The :mod:`test.support` module defines the following functions:
.. versionadded:: 3.6
+.. function:: adjust_int_max_str_digits(max_digits)
+
+ This function returns a context manager that will change the global
+ :func:`sys.set_int_max_str_digits` setting for the duration of the
+ context to allow execution of test code that needs a different limit
+ on the number of digits when converting between an integer and string.
+
+ .. versionadded:: 3.8.14
+
+
The :mod:`test.support` module defines the following classes:
.. class:: TransientResource(exc, **kwargs)
diff --git a/Doc/using/cmdline.rst b/Doc/using/cmdline.rst
index 5aee334..08401d1 100644
--- a/Doc/using/cmdline.rst
+++ b/Doc/using/cmdline.rst
@@ -437,6 +437,9 @@ Miscellaneous options
* ``-X showalloccount`` to output the total count of allocated objects for
each type when the program finishes. This only works when Python was built with
``COUNT_ALLOCS`` defined.
+ * ``-X int_max_str_digits`` configures the :ref:`integer string conversion
+ length limitation <int_max_str_digits>`. See also
+ :envvar:`PYTHONINTMAXSTRDIGITS`.
* ``-X importtime`` to show how long each import takes. It shows module
name, cumulative time (including nested imports) and self time (excluding
nested imports). Note that its output may be broken in multi-threaded
@@ -487,6 +490,9 @@ Miscellaneous options
The ``-X pycache_prefix`` option. The ``-X dev`` option now logs
``close()`` exceptions in :class:`io.IOBase` destructor.
+ .. versionadded:: 3.8.14
+ The ``-X int_max_str_digits`` option.
+
Options you shouldn't use
~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -646,6 +652,13 @@ conflict.
.. versionadded:: 3.2.3
+.. envvar:: PYTHONINTMAXSTRDIGITS
+
+ If this variable is set to an integer, it is used to configure the
+ interpreter's global :ref:`integer string conversion length limitation
+ <int_max_str_digits>`.
+
+ .. versionadded:: 3.8.14
.. envvar:: PYTHONIOENCODING
diff --git a/Doc/whatsnew/3.8.rst b/Doc/whatsnew/3.8.rst
index 0c1a669..630e060 100644
--- a/Doc/whatsnew/3.8.rst
+++ b/Doc/whatsnew/3.8.rst
@@ -2325,3 +2325,17 @@ any leading zeros.
(Originally contributed by Christian Heimes in :issue:`36384`, and backported
to 3.8 by Achraf Merzouki)
+
+Notable security feature in 3.8.14
+==================================
+
+Converting between :class:`int` and :class:`str` in bases other than 2
+(binary), 4, 8 (octal), 16 (hexadecimal), or 32 such as base 10 (decimal)
+now raises a :exc:`ValueError` if the number of digits in string form is
+above a limit to avoid potential denial of service attacks due to the
+algorithmic complexity. This is a mitigation for `CVE-2020-10735
+<https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-10735>`_.
+This limit can be configured or disabled by environment variable, command
+line flag, or :mod:`sys` APIs. See the :ref:`integer string conversion
+length limitation <int_max_str_digits>` documentation. The default limit
+is 4300 digits in string form.