From 87d3bd0e02cddc415a42573052110eb9301d2c3d Mon Sep 17 00:00:00 2001 From: Mark Dickinson Date: Sun, 8 Jan 2023 19:40:15 +0000 Subject: gh-100833: Remove 'volatile' qualifiers in fsum algorithm (#100845) This PR removes the `volatile` qualifier on various intermediate quantities in the `math.fsum` implementation, and updates the notes preceding the algorithm accordingly (as well as fixing some of the exsting notes). See the linked issue #100833 for discussion. --- .../2023-01-08-12-10-17.gh-issue-100833.f6cT7E.rst | 1 + Modules/mathmodule.c | 44 +++++++++++----------- 2 files changed, 23 insertions(+), 22 deletions(-) create mode 100644 Misc/NEWS.d/next/Library/2023-01-08-12-10-17.gh-issue-100833.f6cT7E.rst diff --git a/Misc/NEWS.d/next/Library/2023-01-08-12-10-17.gh-issue-100833.f6cT7E.rst b/Misc/NEWS.d/next/Library/2023-01-08-12-10-17.gh-issue-100833.f6cT7E.rst new file mode 100644 index 0000000..b572e92 --- /dev/null +++ b/Misc/NEWS.d/next/Library/2023-01-08-12-10-17.gh-issue-100833.f6cT7E.rst @@ -0,0 +1 @@ +Speed up :func:`math.fsum` by removing defensive ``volatile`` qualifiers. diff --git a/Modules/mathmodule.c b/Modules/mathmodule.c index 11e815c..9e4c6fd 100644 --- a/Modules/mathmodule.c +++ b/Modules/mathmodule.c @@ -1358,30 +1358,30 @@ FUNC1(tanh, tanh, 0, Dickinson's post at . See those links for more details, proofs and other references. - Note 1: IEEE 754R floating point semantics are assumed, - but the current implementation does not re-establish special - value semantics across iterations (i.e. handling -Inf + Inf). + Note 1: IEEE 754 floating-point semantics with a rounding mode of + roundTiesToEven are assumed. - Note 2: No provision is made for intermediate overflow handling; - therefore, sum([1e+308, 1e-308, 1e+308]) returns 1e+308 while - sum([1e+308, 1e+308, 1e-308]) raises an OverflowError due to the + Note 2: No provision is made for intermediate overflow handling; + therefore, fsum([1e+308, -1e+308, 1e+308]) returns 1e+308 while + fsum([1e+308, 1e+308, -1e+308]) raises an OverflowError due to the overflow of the first partial sum. - Note 3: The intermediate values lo, yr, and hi are declared volatile so - aggressive compilers won't algebraically reduce lo to always be exactly 0.0. - Also, the volatile declaration forces the values to be stored in memory as - regular doubles instead of extended long precision (80-bit) values. This - prevents double rounding because any addition or subtraction of two doubles - can be resolved exactly into double-sized hi and lo values. As long as the - hi value gets forced into a double before yr and lo are computed, the extra - bits in downstream extended precision operations (x87 for example) will be - exactly zero and therefore can be losslessly stored back into a double, - thereby preventing double rounding. - - Note 4: A similar implementation is in Modules/cmathmodule.c. - Be sure to update both when making changes. - - Note 5: The signature of math.fsum() differs from builtins.sum() + Note 3: The algorithm has two potential sources of fragility. First, C + permits arithmetic operations on `double`s to be performed in an + intermediate format whose range and precision may be greater than those of + `double` (see for example C99 ยง5.2.4.2.2, paragraph 8). This can happen for + example on machines using the now largely historical x87 FPUs. In this case, + `fsum` can produce incorrect results. If `FLT_EVAL_METHOD` is `0` or `1`, or + `FLT_EVAL_METHOD` is `2` and `long double` is identical to `double`, then we + should be safe from this source of errors. Second, an aggressively + optimizing compiler can re-associate operations so that (for example) the + statement `yr = hi - x;` is treated as `yr = (x + y) - x` and then + re-associated as `yr = y + (x - x)`, giving `y = yr` and `lo = 0.0`. That + re-association would be in violation of the C standard, and should not occur + except possibly in the presence of unsafe optimizations (e.g., -ffast-math, + -fassociative-math). Such optimizations should be avoided for this module. + + Note 4: The signature of math.fsum() differs from builtins.sum() because the start argument doesn't make sense in the context of accurate summation. Since the partials table is collapsed before returning a result, sum(seq2, start=sum(seq1)) may not equal the @@ -1467,7 +1467,7 @@ math_fsum(PyObject *module, PyObject *seq) Py_ssize_t i, j, n = 0, m = NUM_PARTIALS; double x, y, t, ps[NUM_PARTIALS], *p = ps; double xsave, special_sum = 0.0, inf_sum = 0.0; - volatile double hi, yr, lo; + double hi, yr, lo; iter = PyObject_GetIter(seq); if (iter == NULL) -- cgit v0.12