diff options
author | Martin v. Löwis <martin@v.loewis.de> | 2003-06-28 08:11:55 (GMT) |
---|---|---|
committer | Martin v. Löwis <martin@v.loewis.de> | 2003-06-28 08:11:55 (GMT) |
commit | 7928f388c445223cef857a9cee94bb20f61d9286 (patch) | |
tree | 6441f13ea10e8bd8c5aef3a64456facc8200b190 | |
parent | ab1e5858eea540e50e8acccdbd37ff86a5afdd19 (diff) | |
download | cpython-7928f388c445223cef857a9cee94bb20f61d9286.zip cpython-7928f388c445223cef857a9cee94bb20f61d9286.tar.gz cpython-7928f388c445223cef857a9cee94bb20f61d9286.tar.bz2 |
Explain source encodings. Fixes #683486.
-rw-r--r-- | Doc/tut/tut.tex | 33 |
1 files changed, 33 insertions, 0 deletions
diff --git a/Doc/tut/tut.tex b/Doc/tut/tut.tex index 119e075..2104775 100644 --- a/Doc/tut/tut.tex +++ b/Doc/tut/tut.tex @@ -303,6 +303,39 @@ beginning of the script and giving the file an executable mode. The the hash, or pound, character, \character{\#}, is used to start a comment in Python. +\subsection{Source Code Encoding} + +It is possible to use encodings different than ASCII in Python source +files. The best way to do it is to put one more special comment line +right after \code{#!} line making proper encoding declaration: + +\begin{verbatim} +# -*- coding: iso-8859-1 -*- +\end{verbatim} + +With that declaration, all characters in the source file will be +treated as belonging to \code{iso-8859-1} encoding, and it will be +possible to directly write Unicode string literals in the selected +encoding. The list of possible encodings can be found in the +\citetitle[../lib/lib.html]{Python Library Reference}, in the section +on \module{codecs}. + +If your editor supports saving files as \code{UTF-8} with an UTF-8 +signature (aka BOM -- Byte Order Mark), you can use that instead of an +encoding declaration. IDLE supports such saving if +\code{Options/General/Default Source Encoding/UTF-8} is set. Notice +that this signature is not understood in older Python releases (2.2 +and earlier), and also not understood by the operating system for +\code{#!} files. + +By using UTF-8 (either through the signature, or a an encoding +declaration), characters of most languages in the world can be used +simultaneously in string literals and comments. Using non-ASCII +characters in identifiers is not supported. To display all these +characters properly, your editor must recognize that the file is +UTF-8, and it must use a font that supports all the characters in the +file. + \subsection{The Interactive Startup File \label{startup}} % XXX This should probably be dumped in an appendix, since most people |