#4487: have Charset check with codecs for possible aliases.

Previously, unexpected results occurred when email was passed, for example, 'utf8' as a charset name, since email would accept it but would *not* use the 'utf-8' codec for it, even though Python itself recognises that as an alias for utf-8. Now Charset checks with codecs for aliases as well as its own internal table. Issue 8898 has been opened to change this further in py3k so that all aliasing is routed through the codecs module.
author: R. David Murray <rdmurray@bitdance.com> 2010-06-04 19:51:06 (GMT)
committer: R. David Murray <rdmurray@bitdance.com> 2010-06-04 19:51:06 (GMT)
commit: e7e505ba6e2ba1768ba81e9dde652b6aff34c386 (patch)
tree: 34592df5dc9e94b983f36d379bbcc26a80488054
parent: eba67c0eac27cfafb779372ef82de09aefcca262 (diff)
download: cpython-e7e505ba6e2ba1768ba81e9dde652b6aff34c386.zip
cpython-e7e505ba6e2ba1768ba81e9dde652b6aff34c386.tar.gz
cpython-e7e505ba6e2ba1768ba81e9dde652b6aff34c386.tar.bz2
3 files changed, 13 insertions, 1 deletions
diff --git a/Lib/email/charset.py b/Lib/email/charset.py
index 9bebf6f..ad56c58 100644
--- a/Lib/email/charset.py
+++ b/Lib/email/charset.py
@@ -9,6 +9,7 @@ __all__ = [
     'add_codec',
     ]
 
+import codecs
 import email.base64mime
 import email.quoprimime
 
@@ -209,7 +210,12 @@ class Charset:
         except UnicodeError:
             raise errors.CharsetError(input_charset)
         input_charset = input_charset.lower()
-        # Set the input charset after filtering through the aliases
+        # Set the input charset after filtering through the aliases and/or codecs
+        if not (input_charset in ALIASES or input_charset in CHARSETS):
+            try:
+                input_charset = codecs.lookup(input_charset).name
+            except LookupError:
+                pass
         self.input_charset = ALIASES.get(input_charset, input_charset)
         # We can try to guess which encoding and conversion to use by the
         # charset_map dictionary.  Try that first, but let the user override
diff --git a/Lib/email/test/test_email.py b/Lib/email/test/test_email.py
index 94eec86..4ce9848 100644
--- a/Lib/email/test/test_email.py
+++ b/Lib/email/test/test_email.py
@@ -2868,6 +2868,9 @@ class TestCharset(unittest.TestCase):
         self.assertEqual(str(charset), 'us-ascii')
         self.assertRaises(Errors.CharsetError, Charset, 'asc\xffii')
 
+    def test_codecs_aliases_accepted(self):
+        charset = Charset('utf8')
+        self.assertEqual(str(charset), 'utf-8')
 
 
 # Test multilingual MIME headers.
diff --git a/Misc/NEWS b/Misc/NEWS
index 0af2157..f82f048 100644
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -46,6 +46,9 @@ C-API
 Library
 -------
 
+- Issue #4487: email now accepts as charset aliases all codec aliases
+  accepted by the codecs module.
+
 - Issue #6470: Drop UNC prefix in FixTk.
 
 - Issue #5610: feedparser no longer eats extra characters at the end of
author	R. David Murray <rdmurray@bitdance.com>	2010-06-04 19:51:06 (GMT)
committer	R. David Murray <rdmurray@bitdance.com>	2010-06-04 19:51:06 (GMT)
commit	e7e505ba6e2ba1768ba81e9dde652b6aff34c386 (patch)
tree	34592df5dc9e94b983f36d379bbcc26a80488054
parent	eba67c0eac27cfafb779372ef82de09aefcca262 (diff)
download	cpython-e7e505ba6e2ba1768ba81e9dde652b6aff34c386.zip cpython-e7e505ba6e2ba1768ba81e9dde652b6aff34c386.tar.gz cpython-e7e505ba6e2ba1768ba81e9dde652b6aff34c386.tar.bz2