Double-fix of crash in Unicode freelist handling.

If a length-1 Unicode string was in the freelist and it was uninitialized or pointed to a very large (magnitude) negative number, the check unicode_latin1[unicode->str[0]] == unicode could cause a segmentation violation, e.g. unicode->str[0] is 0xcbcbcbcb. Fix this in two ways: 1. Change guard befor unicode_latin1[] to test against 256U. If I understand correctly, the unsigned long used to store UCS4 on my box was getting converted to a signed long to compare with the signed constant 256. 2. Change _PyUnicode_New() to make sure the first element of str is always initialized to zero. There are several places in the code where the caller can exit with an error before initializing any of str, which would leave junk in str[0]. Also, silence a compiler warning on pointer vs. int arithmetic. Bug fix candidate.
author: Jeremy Hylton <jeremy@alum.mit.edu> 2003-09-16 19:41:39 (GMT)
committer: Jeremy Hylton <jeremy@alum.mit.edu> 2003-09-16 19:41:39 (GMT)
commit: d808279be3850f085b02a5a612246f90daf31ecc (patch)
tree: 70a2b87244ac2f7b9434d41744d1a69703bd47c1
parent: a9e14b70150d5bc064afd3144097ec0095869f10 (diff)
download: cpython-d808279be3850f085b02a5a612246f90daf31ecc.zip
cpython-d808279be3850f085b02a5a612246f90daf31ecc.tar.gz
cpython-d808279be3850f085b02a5a612246f90daf31ecc.tar.bz2
2 files changed, 12 insertions, 2 deletions
diff --git a/Misc/NEWS b/Misc/NEWS
index 6f9855c..7ffe521 100644
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -12,6 +12,11 @@ What's New in Python 2.4 alpha 1?
 Core and builtins
 -----------------
 
+- Fixed a bug in the cache of length-one Unicode strings that could
+  lead to a seg fault.  The specific problem occurred when an earlier,
+  non-fatal error left an uninitialized Unicode object in the
+  freelist.
+
 - The % formatting operator now supports '%F' which is equivalent to
   '%f'.  This has always been documented but never implemented.
 
diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
index e2a16d9..7adcd67 100644
--- a/Objects/unicodeobject.c
+++ b/Objects/unicodeobject.c
@@ -132,7 +132,8 @@ int unicode_resize(register PyUnicodeObject *unicode,
        instead ! */
     if (unicode == unicode_empty || 
 	(unicode->length == 1 && 
-	 unicode->str[0] < 256 &&
+         /* XXX Is unicode->str[] always unsigned? */
+	 unicode->str[0] < 256U &&
 	 unicode_latin1[unicode->str[0]] == unicode)) {
         PyErr_SetString(PyExc_SystemError,
                         "can't resize shared unicode objects");
@@ -211,6 +212,10 @@ PyUnicodeObject *_PyUnicode_New(int length)
 	PyErr_NoMemory();
 	goto onError;
     }
+    /* Initialize the first element to guard against cases where
+       the caller fails before initializing str.
+    */
+    unicode->str[0] = 0;
     unicode->str[length] = 0;
     unicode->length = length;
     unicode->hash = -1;
@@ -2527,7 +2532,7 @@ PyObject *PyUnicode_DecodeASCII(const char *s,
 	else {
 	    startinpos = s-starts;
 	    endinpos = startinpos + 1;
-	    outpos = p-PyUnicode_AS_UNICODE(v);
+	    outpos = p - (Py_UNICODE *)PyUnicode_AS_UNICODE(v);
 	    if (unicode_decode_call_errorhandler(
 		 errors, &errorHandler,
 		 "ascii", "ordinal not in range(128)",
author	Jeremy Hylton <jeremy@alum.mit.edu>	2003-09-16 19:41:39 (GMT)
committer	Jeremy Hylton <jeremy@alum.mit.edu>	2003-09-16 19:41:39 (GMT)
commit	d808279be3850f085b02a5a612246f90daf31ecc (patch)
tree	70a2b87244ac2f7b9434d41744d1a69703bd47c1
parent	a9e14b70150d5bc064afd3144097ec0095869f10 (diff)
download	cpython-d808279be3850f085b02a5a612246f90daf31ecc.zip cpython-d808279be3850f085b02a5a612246f90daf31ecc.tar.gz cpython-d808279be3850f085b02a5a612246f90daf31ecc.tar.bz2