From 87d3bd0e02cddc415a42573052110eb9301d2c3d Mon Sep 17 00:00:00 2001
From: Mark Dickinson <dickinsm@gmail.com>
Date: Sun, 8 Jan 2023 19:40:15 +0000
Subject: gh-100833: Remove 'volatile' qualifiers in fsum algorithm (#100845)

This PR removes the `volatile` qualifier on various intermediate quantities
in the `math.fsum` implementation, and updates the notes preceding the
algorithm accordingly (as well as fixing some of the exsting notes). See
the linked issue #100833 for discussion.
---
 .../2023-01-08-12-10-17.gh-issue-100833.f6cT7E.rst |  1 +
 Modules/mathmodule.c                               | 44 +++++++++++-----------
 2 files changed, 23 insertions(+), 22 deletions(-)
 create mode 100644 Misc/NEWS.d/next/Library/2023-01-08-12-10-17.gh-issue-100833.f6cT7E.rst

diff --git a/Misc/NEWS.d/next/Library/2023-01-08-12-10-17.gh-issue-100833.f6cT7E.rst b/Misc/NEWS.d/next/Library/2023-01-08-12-10-17.gh-issue-100833.f6cT7E.rst
new file mode 100644
index 0000000..b572e92
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2023-01-08-12-10-17.gh-issue-100833.f6cT7E.rst
@@ -0,0 +1 @@
+Speed up :func:`math.fsum` by removing defensive ``volatile`` qualifiers.
diff --git a/Modules/mathmodule.c b/Modules/mathmodule.c
index 11e815c..9e4c6fd 100644
--- a/Modules/mathmodule.c
+++ b/Modules/mathmodule.c
@@ -1358,30 +1358,30 @@ FUNC1(tanh, tanh, 0,
    Dickinson's post at <http://bugs.python.org/file10357/msum4.py>.
    See those links for more details, proofs and other references.
 
-   Note 1: IEEE 754R floating point semantics are assumed,
-   but the current implementation does not re-establish special
-   value semantics across iterations (i.e. handling -Inf + Inf).
+   Note 1: IEEE 754 floating-point semantics with a rounding mode of
+   roundTiesToEven are assumed.
 
-   Note 2:  No provision is made for intermediate overflow handling;
-   therefore, sum([1e+308, 1e-308, 1e+308]) returns 1e+308 while
-   sum([1e+308, 1e+308, 1e-308]) raises an OverflowError due to the
+   Note 2: No provision is made for intermediate overflow handling;
+   therefore, fsum([1e+308, -1e+308, 1e+308]) returns 1e+308 while
+   fsum([1e+308, 1e+308, -1e+308]) raises an OverflowError due to the
    overflow of the first partial sum.
 
-   Note 3: The intermediate values lo, yr, and hi are declared volatile so
-   aggressive compilers won't algebraically reduce lo to always be exactly 0.0.
-   Also, the volatile declaration forces the values to be stored in memory as
-   regular doubles instead of extended long precision (80-bit) values.  This
-   prevents double rounding because any addition or subtraction of two doubles
-   can be resolved exactly into double-sized hi and lo values.  As long as the
-   hi value gets forced into a double before yr and lo are computed, the extra
-   bits in downstream extended precision operations (x87 for example) will be
-   exactly zero and therefore can be losslessly stored back into a double,
-   thereby preventing double rounding.
-
-   Note 4: A similar implementation is in Modules/cmathmodule.c.
-   Be sure to update both when making changes.
-
-   Note 5: The signature of math.fsum() differs from builtins.sum()
+   Note 3: The algorithm has two potential sources of fragility. First, C
+   permits arithmetic operations on `double`s to be performed in an
+   intermediate format whose range and precision may be greater than those of
+   `double` (see for example C99 §5.2.4.2.2, paragraph 8). This can happen for
+   example on machines using the now largely historical x87 FPUs. In this case,
+   `fsum` can produce incorrect results. If `FLT_EVAL_METHOD` is `0` or `1`, or
+   `FLT_EVAL_METHOD` is `2` and `long double` is identical to `double`, then we
+   should be safe from this source of errors. Second, an aggressively
+   optimizing compiler can re-associate operations so that (for example) the
+   statement `yr = hi - x;` is treated as `yr = (x + y) - x` and then
+   re-associated as `yr = y + (x - x)`, giving `y = yr` and `lo = 0.0`. That
+   re-association would be in violation of the C standard, and should not occur
+   except possibly in the presence of unsafe optimizations (e.g., -ffast-math,
+   -fassociative-math). Such optimizations should be avoided for this module.
+
+   Note 4: The signature of math.fsum() differs from builtins.sum()
    because the start argument doesn't make sense in the context of
    accurate summation.  Since the partials table is collapsed before
    returning a result, sum(seq2, start=sum(seq1)) may not equal the
@@ -1467,7 +1467,7 @@ math_fsum(PyObject *module, PyObject *seq)
     Py_ssize_t i, j, n = 0, m = NUM_PARTIALS;
     double x, y, t, ps[NUM_PARTIALS], *p = ps;
     double xsave, special_sum = 0.0, inf_sum = 0.0;
-    volatile double hi, yr, lo;
+    double hi, yr, lo;
 
     iter = PyObject_GetIter(seq);
     if (iter == NULL)
-- 
cgit v0.12