Make identifiers str (not str8) objects throughout.

This affects the parser, various object implementations, and all places that put identifiers into C string literals. In testing, a number of crashes occurred as code would fail when the recursion limit was reached (such as the Unicode interning dictionary having key/value pairs where key is not value). To solve these, I added an overflowed flag, which allows for 50 more recursions after the limit was reached and the exception was raised, and a recursion_critical flag, which indicates that recursion absolutely must be allowed, i.e. that a certain call must not cause a stack overflow exception. There are still some places where both str and str8 are accepted as identifiers; these should eventually be removed.
author: Martin v. Löwis <martin@v.loewis.de> 2007-06-10 09:51:05 (GMT)
committer: Martin v. Löwis <martin@v.loewis.de> 2007-06-10 09:51:05 (GMT)
commit: 5b222135f8d2492713994f2cb003980e87ce6a72 (patch)
tree: 3ac3a6a1d7805360ed779e884ca6c4b3f000321f /Parser/tokenizer.c
parent: 38e43c25eede3fa77d90ac8183cc0335f4861f4a (diff)
download: cpython-5b222135f8d2492713994f2cb003980e87ce6a72.zip
cpython-5b222135f8d2492713994f2cb003980e87ce6a72.tar.gz
cpython-5b222135f8d2492713994f2cb003980e87ce6a72.tar.bz2
1 files changed, 13 insertions, 2 deletions
diff --git a/Parser/tokenizer.c b/Parser/tokenizer.c
index f3eeb2c..e7dada6 100644
--- a/Parser/tokenizer.c
+++ b/Parser/tokenizer.c
@@ -18,6 +18,17 @@
 #include "abstract.h"
 #endif /* PGEN */
 
+#define is_potential_identifier_start(c) (\
+                          (c >= 'a' && c <= 'z')\
+		       || (c >= 'A' && c <= 'Z')\
+		       || c == '_')
+
+#define is_potential_identifier_char(c) (\
+                          (c >= 'a' && c <= 'z')\
+		       || (c >= 'A' && c <= 'Z')\
+		       || (c >= '0' && c <= '9')\
+		       || c == '_')
+
 extern char *PyOS_Readline(FILE *, FILE *, char *);
 /* Return malloc'ed string including trailing \n;
    empty malloc'ed string for EOF;
@@ -1209,7 +1220,7 @@ tok_get(register struct tok_state *tok, char **p_start, char **p_end)
 	}
 
 	/* Identifier (most frequent token!) */
-	if (isalpha(c) || c == '_') {
+	if (is_potential_identifier_start(c)) {
 		/* Process r"", u"" and ur"" */
 		switch (c) {
 		case 'r':
@@ -1227,7 +1238,7 @@ tok_get(register struct tok_state *tok, char **p_start, char **p_end)
 				goto letter_quote;
 			break;
 		}
-		while (isalnum(c) || c == '_') {
+		while (is_potential_identifier_char(c)) {
 			c = tok_nextc(tok);
 		}
 		tok_backup(tok, c);
author	Martin v. Löwis <martin@v.loewis.de>	2007-06-10 09:51:05 (GMT)
committer	Martin v. Löwis <martin@v.loewis.de>	2007-06-10 09:51:05 (GMT)
commit	5b222135f8d2492713994f2cb003980e87ce6a72 (patch)
tree	3ac3a6a1d7805360ed779e884ca6c4b3f000321f /Parser/tokenizer.c
parent	38e43c25eede3fa77d90ac8183cc0335f4861f4a (diff)
download	cpython-5b222135f8d2492713994f2cb003980e87ce6a72.zip cpython-5b222135f8d2492713994f2cb003980e87ce6a72.tar.gz cpython-5b222135f8d2492713994f2cb003980e87ce6a72.tar.bz2