diff options
author | Martin v. Löwis <martin@v.loewis.de> | 2007-06-10 09:51:05 (GMT) |
---|---|---|
committer | Martin v. Löwis <martin@v.loewis.de> | 2007-06-10 09:51:05 (GMT) |
commit | 5b222135f8d2492713994f2cb003980e87ce6a72 (patch) | |
tree | 3ac3a6a1d7805360ed779e884ca6c4b3f000321f /Parser/tokenizer.c | |
parent | 38e43c25eede3fa77d90ac8183cc0335f4861f4a (diff) | |
download | cpython-5b222135f8d2492713994f2cb003980e87ce6a72.zip cpython-5b222135f8d2492713994f2cb003980e87ce6a72.tar.gz cpython-5b222135f8d2492713994f2cb003980e87ce6a72.tar.bz2 |
Make identifiers str (not str8) objects throughout.
This affects the parser, various object implementations,
and all places that put identifiers into C string literals.
In testing, a number of crashes occurred as code would
fail when the recursion limit was reached (such as the
Unicode interning dictionary having key/value pairs where
key is not value). To solve these, I added an overflowed
flag, which allows for 50 more recursions after the
limit was reached and the exception was raised, and
a recursion_critical flag, which indicates that recursion
absolutely must be allowed, i.e. that a certain call
must not cause a stack overflow exception.
There are still some places where both str and str8 are
accepted as identifiers; these should eventually be
removed.
Diffstat (limited to 'Parser/tokenizer.c')
-rw-r--r-- | Parser/tokenizer.c | 15 |
1 files changed, 13 insertions, 2 deletions
diff --git a/Parser/tokenizer.c b/Parser/tokenizer.c index f3eeb2c..e7dada6 100644 --- a/Parser/tokenizer.c +++ b/Parser/tokenizer.c @@ -18,6 +18,17 @@ #include "abstract.h" #endif /* PGEN */ +#define is_potential_identifier_start(c) (\ + (c >= 'a' && c <= 'z')\ + || (c >= 'A' && c <= 'Z')\ + || c == '_') + +#define is_potential_identifier_char(c) (\ + (c >= 'a' && c <= 'z')\ + || (c >= 'A' && c <= 'Z')\ + || (c >= '0' && c <= '9')\ + || c == '_') + extern char *PyOS_Readline(FILE *, FILE *, char *); /* Return malloc'ed string including trailing \n; empty malloc'ed string for EOF; @@ -1209,7 +1220,7 @@ tok_get(register struct tok_state *tok, char **p_start, char **p_end) } /* Identifier (most frequent token!) */ - if (isalpha(c) || c == '_') { + if (is_potential_identifier_start(c)) { /* Process r"", u"" and ur"" */ switch (c) { case 'r': @@ -1227,7 +1238,7 @@ tok_get(register struct tok_state *tok, char **p_start, char **p_end) goto letter_quote; break; } - while (isalnum(c) || c == '_') { + while (is_potential_identifier_char(c)) { c = tok_nextc(tok); } tok_backup(tok, c); |