| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The mapping dictionaries can now contain 1-n mappings, meaning
that character ordinals may be mapped to strings or Unicode object,
e.g. 0x0078 ('x') -> u"abc", causing the ordinal to be replaced by
the complete string or Unicode object instead of just one character.
Another feature introduced by the patch is that of mapping oridnals to
the emtpy string. This allows removing characters.
The patch is different from patch #103100 in that it does not cause a
performance hit for the normal use case of 1-1 mappings.
Written by Marc-Andre Lemburg, copyright assigned to Guido van Rossum.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
codec to not apply Latin-1 mappings for keys which are not found
in the mapping dictionaries, but instead treat them as undefined
mappings.
The patch was originally written by Martin v. Loewis with some
additional (cosmetic) changes and an updated test script
by Marc-Andre Lemburg.
The standard codecs were recreated from the most current files
available at the Unicode.org site using the Tools/scripts/gencodec.py
tool.
This patch closes the bugs #116285 and #119960.
|
|
|
|
|
| |
incorrect % characters; characters outside the printable range are
replaced with '?'
|
| |
|
|
|
|
| |
index at which an unknown %-escape was found
|
| |
|
|
|
|
|
|
| |
after unicode_empty has been freed, otherwise it might not point to
the real start of the unicode_freelist. Final closure for SF bug
#110681, Jitterbug PR#398.
|
|
|
|
| |
list. Discovered by Barry, fix approved by MAL.
|
|
|
|
|
|
|
|
| |
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.
This closes SourceForge patch #101659 and bug #115323.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
%d,i,u,x,X,o formats.
Note a curious extension to the std C rules: x, X and o formatting can never produce
a sign character in C, so the '+' and ' ' flags are meaningless for them. But
unbounded ints *can* produce a sign character under these conversions (no fixed-
width bitstring is wide enough to hold all negative values in 2's-comp form). So
these flags become meaningful in Python when formatting a Python long which is too
big to fit in a C long. This required shuffling around existing code, which hacked
x and X conversions to death when both the '#' and '0' flags were specified: the
hacks weren't strong enough to deal with the simultaneous possibility of the ' ' or
'+' flags too, since signs were always meaningless before for x and X conversions.
Isomorphic shuffling was required in unicodeobject.c.
Also added dozens of non-trivial new unbounded-int test cases to test_format.py.
|
|
|
|
|
|
|
| |
all, either to see whether the # of chars fit in an int, or that the
amount of memory needed fit in a size_t. Checking these is expensive, but
the alternative is silently wrong answers (as in the bug report) or
core dumps (which were easy to provoke using Unicode strings).
|
|
|
|
|
|
| |
strings. closes PEP-223.
also added \U escape (eight hex digits).
|
|
|
|
| |
`str' is no longer necessary. Gotta turn on -Wall!
|
|
|
|
|
| |
PyUnicode_EncodeUTF8() already returns the created object with the
proper reference count. This fixes an Insure reported memory leak.
|
|
|
|
|
|
|
| |
resized after creation. 0-length strings are usually shared
and _PyString_Resize() fails on these shared strings.
Fixes [ Bug #111667 ] unicode core dump.
|
|
|
|
|
|
| |
Properly end a comment block. It was terminated fine later but by a subsequent
block and. It was also in #if 0. This patch is so trivial I can't believe I am
talking about it. :)
|
|
|
|
|
|
|
| |
function (together with other locale aware ones) should into a new collation
support module. See python-dev for a discussion of this removal.
Note: This patch should also be applied to the 1.6 branch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the Python Unicode implementation.
The internal buffer used for implementing the buffer protocol
is renamed to defenc to make this change visible. It now holds the
default encoded version of the Unicode object and is calculated
on demand (NULL otherwise).
Since the default encoding defaults to ASCII, this will mean that
Unicode objects which hold non-ASCII characters will no longer
work on C APIs using the "s" or "t" parser markers. C APIs must now
explicitly provide Unicode support via the "u", "U" or "es"/"es#"
parser markers in order to work with non-ASCII Unicode strings.
(Note: this patch will also have to be applied to the 1.6 branch
of the CVS tree.)
|
|
|
|
|
| |
This is a notice without a date, which apparently is not a claim to
copyright but only advice to the reader. IANAL. :-)
|
|
|
|
| |
marked my*.h as obsolete
|
|
|
|
|
| |
char**) and return an int even on PC platforms. If not, please fix
PC/utils/makesrc.c ;-P
|
| |
|
|
|
|
| |
clean out some other warnings
|
| |
|
|
|
|
| |
(patch #100912)
|
|
|
|
|
| |
unicodeobject.c(735) :
error C2143: syntax error : missing ';' before '}'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The UTF-8 decoder is still buggy (i.e. it doesn't pass Markus Kuhn's
stress test), mainly due to the following construct:
#define UTF8_ERROR(details) do { \
if (utf8_decoding_error(&s, &p, errors, details)) \
goto onError; \
continue; \
} while (0)
(The "continue" statement is supposed to exit from the outer loop,
but of course, it doesn't. Indeed, this is a marvelous example of
the dangers of the C programming language and especially of the C
preprocessor.)
|
|
|
|
|
|
|
|
|
|
| |
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").
There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
|
| |
|
|
|
|
| |
better error message for unicode coercion failure
|
|
|
|
|
|
|
|
|
|
| |
value is calculated from the character values, in a way
that makes sure an 8-bit ASCII string and a unicode string
with the same contents get the same hash value.
(as a side effect, this also works for ISO Latin 1 strings).
for more details, see the python-dev discussion.
|
| |
|
|
|
|
|
|
| |
objects including instance objects.
The old API PyUnicode_FromObject() is still available as shortcut.
|
|
|
|
|
| |
corrected some usage of 'unsigned long' where Py_UNICODE
should have been used.
|
|
|
|
| |
true after revision 2.36 was checked in...
|
| |
|
|
|
|
| |
should have been used.
|
|
|
|
| |
to the new alphabetic lookup APIs in unicodectype.c.
|
|
|
|
|
| |
Make unicode_compare a true UTF-16 compare function (includes
support for surrogates).
|
|
|
|
| |
A previous patch by Jack Jansen was accidently reverted.
|
|
|
|
|
|
| |
New buffer overflow checks for formatting strings.
By Trent Mick.
|
| |
|
|
|
|
|
|
|
|
| |
Patch to the standard unicode-escape codec which dynamically
loads the Unicode name to ordinal mapping from the module
ucnhash.
By Bill Tutt.
|
|
|
|
|
| |
Better error message for "1 in unicodestring". Submitted
by Andrew Kuchling.
|
|
|
|
|
|
|
|
| |
Fixed a bug in PyUnicode_Count() which would have caused a
core dump in case of substring coercion failure.
Synchronized .count() with the string method of the same name
to return len(s)+1 for s.count('').
|
|
|
|
|
| |
This patch fixes an optimisation mystery in _PyUnicodeNew causing segfaults
on AIX when the interpreter is compiled with -O.
|
|
|
|
| |
Added code so that .isXXX() testing returns 0 for emtpy strings.
|
|
|
|
|
| |
Fixed a typo and removed a debug printf(). Thanks to Finn Bock
for finding these.
|
| |
|
|
|
|
|
|
|
|
| |
Fixed %c formatting to check for one character arguments. Thanks
to Finn Bock for finding this bug.
Added a fix for bug PR#348 which originated from not resetting
the globals correctly in _PyUnicode_Fini().
|