diff options
author | Tim Peters <tim.peters@gmail.com> | 2002-08-12 18:25:43 (GMT) |
---|---|---|
committer | Tim Peters <tim.peters@gmail.com> | 2002-08-12 18:25:43 (GMT) |
commit | 115c888b97125e31851fc64cc0292c4fc3b122dd (patch) | |
tree | c42608ce0600e39b0a3ccf8ad21085a9042764fd /Objects | |
parent | d64c1def7ca69e346bade11f2a99651eb8e2ff35 (diff) | |
download | cpython-115c888b97125e31851fc64cc0292c4fc3b122dd.zip cpython-115c888b97125e31851fc64cc0292c4fc3b122dd.tar.gz cpython-115c888b97125e31851fc64cc0292c4fc3b122dd.tar.bz2 |
x_mul(): Made life easier for C optimizers in the "grade school"
algorithm. MSVC 6 wasn't impressed <wink>.
Something odd: the x_mul algorithm appears to get substantially worse
than quadratic time as the inputs grow larger:
bits in each input x_mul time k_mul time
------------------ ---------- ----------
15360 0.01 0.00
30720 0.04 0.01
61440 0.16 0.04
122880 0.64 0.14
245760 2.56 0.40
491520 10.76 1.23
983040 71.28 3.69
1966080 459.31 11.07
That is, x_mul is perfectly quadratic-time until a little burp at
2.56->10.76, and after that goes to hell in a hurry. Under Karatsuba,
doubling the input size "should take" 3 times longer instead of 4, and
that remains the case throughout this range. I conclude that my "be nice
to the cache" reworkings of k_mul() are paying.
Diffstat (limited to 'Objects')
-rw-r--r-- | Objects/longobject.c | 9 |
1 files changed, 5 insertions, 4 deletions
diff --git a/Objects/longobject.c b/Objects/longobject.c index 0eefc90..0801e64 100644 --- a/Objects/longobject.c +++ b/Objects/longobject.c @@ -1539,20 +1539,21 @@ x_mul(PyLongObject *a, PyLongObject *b) twodigits carry = 0; twodigits f = a->ob_digit[i]; int j; + digit *pz = z->ob_digit + i; SIGCHECK({ Py_DECREF(z); return NULL; }) for (j = 0; j < size_b; ++j) { - carry += z->ob_digit[i+j] + b->ob_digit[j] * f; - z->ob_digit[i+j] = (digit) (carry & MASK); + carry += *pz + b->ob_digit[j] * f; + *pz++ = (digit) (carry & MASK); carry >>= SHIFT; } for (; carry != 0; ++j) { assert(i+j < z->ob_size); - carry += z->ob_digit[i+j]; - z->ob_digit[i+j] = (digit) (carry & MASK); + carry += *pz; + *pz++ = (digit) (carry & MASK); carry >>= SHIFT; } } |