x_mul(): This failed to normalize its result.

k_mul(): This didn't allocate enough result space when one input had more than twice as many bits as the other. This was partly hidden by that x_mul() didn't normalize its result. The Karatsuba recurrence is pretty much hosed if the inputs aren't roughly the same size. If one has at least twice as many bits as the other, we get a degenerate case where the "high half" of the smaller input is 0. Added a special case for that, for speed, but despite that it helped, this can still be much slower than the "grade school" method. It seems to take a really wild imbalance to trigger that; e.g., a 2**22-bit input times a 1000-bit input on my box runs about twice as slow under k_mul than under x_mul. This still needs to be addressed. I'm also not sure that allocating a->ob_size + b->ob_size digits is enough, given that this is computing k = (ah+al)*(bh+bl) instead of k = (ah-al)*(bl-bh); i.e., it's certainly enough for the final result, but it's vaguely possible that adding in the "artificially" large k may overflow that temporarily. If so, an assert will trigger in the debug build, but we'll probably compute the right result anyway(!).
author: Tim Peters <tim.peters@gmail.com> 2002-08-12 06:17:58 (GMT)
committer: Tim Peters <tim.peters@gmail.com> 2002-08-12 06:17:58 (GMT)
commit: 44121a6bc9828c993932b87e442440dc4f260f3c (patch)
tree: 9d8be85c1b84bbf7104fb9e0df6cbf2f347725a5
parent: 877a2126786bd2a8e5086fbddb05a593c40cbc51 (diff)
download: cpython-44121a6bc9828c993932b87e442440dc4f260f3c.zip
cpython-44121a6bc9828c993932b87e442440dc4f260f3c.tar.gz
cpython-44121a6bc9828c993932b87e442440dc4f260f3c.tar.bz2
1 files changed, 18 insertions, 6 deletions
diff --git a/Objects/longobject.c b/Objects/longobject.c
index fab7dee..6dedd38 100644
--- a/Objects/longobject.c
+++ b/Objects/longobject.c
@@ -1556,7 +1556,7 @@ x_mul(PyLongObject *a, PyLongObject *b)
 			carry >>= SHIFT;
 		}
 	}
-	return z;
+	return long_normalize(z);
 }
 
 /* A helper for Karatsuba multiplication (k_mul).
@@ -1630,8 +1630,15 @@ k_mul(PyLongObject *a, PyLongObject *b)
 	}
 
 	/* Use gradeschool math when either number is too small. */
-	if (ABS(a->ob_size) <= KARATSUBA_CUTOFF)
-		return x_mul(a, b);
+	if (ABS(a->ob_size) <= KARATSUBA_CUTOFF) {
+		/* 0 is inevitable if one kmul arg has more than twice
+		 * the digits of another, so it's worth special-casing.
+		 */
+		if (a->ob_size == 0)
+			return _PyLong_New(0);
+		else
+			return x_mul(a, b);
+	}
 
 	shift = ABS(b->ob_size) >> 1;
 	if (kmul_split(a, shift, &ah, &al) < 0) goto fail;
@@ -1641,16 +1648,21 @@ k_mul(PyLongObject *a, PyLongObject *b)
 	assert(ahbh->ob_size >= 0);
 
 	/* Allocate result space, and copy ahbh into the high digits. */
-	ret = _PyLong_New(ahbh->ob_size + 2*shift + 1);
+	ret = _PyLong_New(ABS(a->ob_size) + ABS(b->ob_size));
 	if (ret == NULL) goto fail;
 #ifdef Py_DEBUG
 	/* Fill with trash, to catch reference to uninitialized digits. */
 	memset(ret->ob_digit, 0xDF, ret->ob_size * sizeof(digit));
 #endif
+	assert(2*shift + ahbh->ob_size <= ret->ob_size);
 	memcpy(ret->ob_digit + 2*shift, ahbh->ob_digit,
 	       ahbh->ob_size * sizeof(digit));
-	/* That didn't copy into the most-significant (overflow) digit. */
-	ret->ob_digit[ret->ob_size - 1] = 0;
+
+	/* Zero-out the digits higher than the ahbh copy. */
+	i = ret->ob_size - 2*shift - ahbh->ob_size;
+	if (i)
+		memset(ret->ob_digit + 2*shift + ahbh->ob_size, 0,
+		       i * sizeof(digit));
 
 	/* Compute al*bl, and copy into the low digits. */
 	if ((albl = k_mul(al, bl)) == NULL) goto fail;
author	Tim Peters <tim.peters@gmail.com>	2002-08-12 06:17:58 (GMT)
committer	Tim Peters <tim.peters@gmail.com>	2002-08-12 06:17:58 (GMT)
commit	44121a6bc9828c993932b87e442440dc4f260f3c (patch)
tree	9d8be85c1b84bbf7104fb9e0df6cbf2f347725a5
parent	877a2126786bd2a8e5086fbddb05a593c40cbc51 (diff)
download	cpython-44121a6bc9828c993932b87e442440dc4f260f3c.zip cpython-44121a6bc9828c993932b87e442440dc4f260f3c.tar.gz cpython-44121a6bc9828c993932b87e442440dc4f260f3c.tar.bz2