diff options
-rwxr-xr-x | Lib/cgi.py | 402 |
1 files changed, 3 insertions, 399 deletions
@@ -4,406 +4,10 @@ This module defines a number of utilities for use by CGI scripts written in Python. - - -Introduction ------------- - -A CGI script is invoked by an HTTP server, usually to process user -input submitted through an HTML <FORM> or <ISINPUT> element. - -Most often, CGI scripts live in the server's special cgi-bin -directory. The HTTP server places all sorts of information about the -request (such as the client's hostname, the requested URL, the query -string, and lots of other goodies) in the script's shell environment, -executes the script, and sends the script's output back to the client. - -The script's input is connected to the client too, and sometimes the -form data is read this way; at other times the form data is passed via -the "query string" part of the URL. This module (cgi.py) is intended -to take care of the different cases and provide a simpler interface to -the Python script. It also provides a number of utilities that help -in debugging scripts, and the latest addition is support for file -uploads from a form (if your browser supports it -- Grail 0.3 and -Netscape 2.0 do). - -The output of a CGI script should consist of two sections, separated -by a blank line. The first section contains a number of headers, -telling the client what kind of data is following. Python code to -generate a minimal header section looks like this: - - print "Content-type: text/html" # HTML is following - print # blank line, end of headers - -The second section is usually HTML, which allows the client software -to display nicely formatted text with header, in-line images, etc. -Here's Python code that prints a simple piece of HTML: - - print "<TITLE>CGI script output</TITLE>" - print "<H1>This is my first CGI script</H1>" - print "Hello, world!" - -It may not be fully legal HTML according to the letter of the -standard, but any browser will understand it. - - -Using the cgi module --------------------- - -Begin by writing "import cgi". Don't use "from cgi import *" -- the -module defines all sorts of names for its own use or for backward -compatibility that you don't want in your namespace. - -It's best to use the FieldStorage class. The other classes define in this -module are provided mostly for backward compatibility. Instantiate it -exactly once, without arguments. This reads the form contents from -standard input or the environment (depending on the value of various -environment variables set according to the CGI standard). Since it may -consume standard input, it should be instantiated only once. - -The FieldStorage instance can be accessed as if it were a Python -dictionary. For instance, the following code (which assumes that the -Content-type header and blank line have already been printed) checks that -the fields "name" and "addr" are both set to a non-empty string: - - form = cgi.FieldStorage() - form_ok = 0 - if form.has_key("name") and form.has_key("addr"): - if form["name"].value != "" and form["addr"].value != "": - form_ok = 1 - if not form_ok: - print "<H1>Error</H1>" - print "Please fill in the name and addr fields." - return - ...further form processing here... - -Here the fields, accessed through form[key], are themselves instances -of FieldStorage (or MiniFieldStorage, depending on the form encoding). - -If the submitted form data contains more than one field with the same -name, the object retrieved by form[key] is not a (Mini)FieldStorage -instance but a list of such instances. If you are expecting this -possibility (i.e., when your HTML form contains multiple fields with -the same name), use the type() function to determine whether you have -a single instance or a list of instances. For example, here's code -that concatenates any number of username fields, separated by commas: - - username = form["username"] - if type(username) is type([]): - # Multiple username fields specified - usernames = "" - for item in username: - if usernames: - # Next item -- insert comma - usernames = usernames + "," + item.value - else: - # First item -- don't insert comma - usernames = item.value - else: - # Single username field specified - usernames = username.value - -If a field represents an uploaded file, the value attribute reads the -entire file in memory as a string. This may not be what you want. You can -test for an uploaded file by testing either the filename attribute or the -file attribute. You can then read the data at leisure from the file -attribute: - - fileitem = form["userfile"] - if fileitem.file: - # It's an uploaded file; count lines - linecount = 0 - while 1: - line = fileitem.file.readline() - if not line: break - linecount = linecount + 1 - -The file upload draft standard entertains the possibility of uploading -multiple files from one field (using a recursive multipart/* -encoding). When this occurs, the item will be a dictionary-like -FieldStorage item. This can be determined by testing its type -attribute, which should have the value "multipart/form-data" (or -perhaps another string beginning with "multipart/"). It this case, it -can be iterated over recursively just like the top-level form object. - -When a form is submitted in the "old" format (as the query string or as a -single data part of type application/x-www-form-urlencoded), the items -will actually be instances of the class MiniFieldStorage. In this case, -the list, file and filename attributes are always None. - - -Old classes ------------ - -These classes, present in earlier versions of the cgi module, are still -supported for backward compatibility. New applications should use the -FieldStorage class. - -SvFormContentDict: single value form content as dictionary; assumes each -field name occurs in the form only once. - -FormContentDict: multiple value form content as dictionary (the form -items are lists of values). Useful if your form contains multiple -fields with the same name. - -Other classes (FormContent, InterpFormContentDict) are present for -backwards compatibility with really old applications only. If you still -use these and would be inconvenienced when they disappeared from a next -version of this module, drop me a note. - - -Functions ---------- - -These are useful if you want more control, or if you want to employ -some of the algorithms implemented in this module in other -circumstances. - -parse(fp, [environ, [keep_blank_values, [strict_parsing]]]): parse a -form into a Python dictionary. - -parse_qs(qs, [keep_blank_values, [strict_parsing]]): parse a query -string (data of type application/x-www-form-urlencoded). Data are -returned as a dictionary. The dictionary keys are the unique query -variable names and the values are lists of vales for each name. - -parse_qsl(qs, [keep_blank_values, [strict_parsing]]): parse a query -string (data of type application/x-www-form-urlencoded). Data are -returned as a list of (name, value) pairs. - -parse_multipart(fp, pdict): parse input of type multipart/form-data (for -file uploads). - -parse_header(string): parse a header like Content-type into a main -value and a dictionary of parameters. - -test(): complete test program. - -print_environ(): format the shell environment in HTML. - -print_form(form): format a form in HTML. - -print_environ_usage(): print a list of useful environment variables in -HTML. - -escape(): convert the characters "&", "<" and ">" to HTML-safe -sequences. Use this if you need to display text that might contain -such characters in HTML. To translate URLs for inclusion in the HREF -attribute of an <A> tag, use urllib.quote(). - -log(fmt, ...): write a line to a log file; see docs for initlog(). - - -Caring about security ---------------------- - -There's one important rule: if you invoke an external program (e.g. -via the os.system() or os.popen() functions), make very sure you don't -pass arbitrary strings received from the client to the shell. This is -a well-known security hole whereby clever hackers anywhere on the web -can exploit a gullible CGI script to invoke arbitrary shell commands. -Even parts of the URL or field names cannot be trusted, since the -request doesn't have to come from your form! - -To be on the safe side, if you must pass a string gotten from a form -to a shell command, you should make sure the string contains only -alphanumeric characters, dashes, underscores, and periods. - - -Installing your CGI script on a Unix system -------------------------------------------- - -Read the documentation for your HTTP server and check with your local -system administrator to find the directory where CGI scripts should be -installed; usually this is in a directory cgi-bin in the server tree. - -Make sure that your script is readable and executable by "others"; the -Unix file mode should be 755 (use "chmod 755 filename"). Make sure -that the first line of the script contains #! starting in column 1 -followed by the pathname of the Python interpreter, for instance: - - #! /usr/local/bin/python - -Make sure the Python interpreter exists and is executable by "others". - -Note that it's probably not a good idea to use #! /usr/bin/env python -here, since the Python interpreter may not be on the default path -given to CGI scripts!!! - -Make sure that any files your script needs to read or write are -readable or writable, respectively, by "others" -- their mode should -be 644 for readable and 666 for writable. This is because, for -security reasons, the HTTP server executes your script as user -"nobody", without any special privileges. It can only read (write, -execute) files that everybody can read (write, execute). The current -directory at execution time is also different (it is usually the -server's cgi-bin directory) and the set of environment variables is -also different from what you get at login. in particular, don't count -on the shell's search path for executables ($PATH) or the Python -module search path ($PYTHONPATH) to be set to anything interesting. - -If you need to load modules from a directory which is not on Python's -default module search path, you can change the path in your script, -before importing other modules, e.g.: - - import sys - sys.path.insert(0, "/usr/home/joe/lib/python") - sys.path.insert(0, "/usr/local/lib/python") - -This way, the directory inserted last will be searched first! - -Instructions for non-Unix systems will vary; check your HTTP server's -documentation (it will usually have a section on CGI scripts). - - -Testing your CGI script ------------------------ - -Unfortunately, a CGI script will generally not run when you try it -from the command line, and a script that works perfectly from the -command line may fail mysteriously when run from the server. There's -one reason why you should still test your script from the command -line: if it contains a syntax error, the python interpreter won't -execute it at all, and the HTTP server will most likely send a cryptic -error to the client. - -Assuming your script has no syntax errors, yet it does not work, you -have no choice but to read the next section: - - -Debugging CGI scripts ---------------------- - -First of all, check for trivial installation errors -- reading the -section above on installing your CGI script carefully can save you a -lot of time. If you wonder whether you have understood the -installation procedure correctly, try installing a copy of this module -file (cgi.py) as a CGI script. When invoked as a script, the file -will dump its environment and the contents of the form in HTML form. -Give it the right mode etc, and send it a request. If it's installed -in the standard cgi-bin directory, it should be possible to send it a -request by entering a URL into your browser of the form: - - http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home - -If this gives an error of type 404, the server cannot find the script --- perhaps you need to install it in a different directory. If it -gives another error (e.g. 500), there's an installation problem that -you should fix before trying to go any further. If you get a nicely -formatted listing of the environment and form content (in this -example, the fields should be listed as "addr" with value "At Home" -and "name" with value "Joe Blow"), the cgi.py script has been -installed correctly. If you follow the same procedure for your own -script, you should now be able to debug it. - -The next step could be to call the cgi module's test() function from -your script: replace its main code with the single statement - - cgi.test() - -This should produce the same results as those gotten from installing -the cgi.py file itself. - -When an ordinary Python script raises an unhandled exception (e.g., -because of a typo in a module name, a file that can't be opened, -etc.), the Python interpreter prints a nice traceback and exits. -While the Python interpreter will still do this when your CGI script -raises an exception, most likely the traceback will end up in one of -the HTTP server's log file, or be discarded altogether. - -Fortunately, once you have managed to get your script to execute -*some* code, it is easy to catch exceptions and cause a traceback to -be printed. The test() function below in this module is an example. -Here are the rules: - - 1. Import the traceback module (before entering the - try-except!) - - 2. Make sure you finish printing the headers and the blank - line early - - 3. Assign sys.stderr to sys.stdout - - 3. Wrap all remaining code in a try-except statement - - 4. In the except clause, call traceback.print_exc() - -For example: - - import sys - import traceback - print "Content-type: text/html" - print - sys.stderr = sys.stdout - try: - ...your code here... - except: - print "\n\n<PRE>" - traceback.print_exc() - -Notes: The assignment to sys.stderr is needed because the traceback -prints to sys.stderr. The print "\n\n<PRE>" statement is necessary to -disable the word wrapping in HTML. - -If you suspect that there may be a problem in importing the traceback -module, you can use an even more robust approach (which only uses -built-in modules): - - import sys - sys.stderr = sys.stdout - print "Content-type: text/plain" - print - ...your code here... - -This relies on the Python interpreter to print the traceback. The -content type of the output is set to plain text, which disables all -HTML processing. If your script works, the raw HTML will be displayed -by your client. If it raises an exception, most likely after the -first two lines have been printed, a traceback will be displayed. -Because no HTML interpretation is going on, the traceback will -readable. - -When all else fails, you may want to insert calls to log() to your -program or even to a copy of the cgi.py file. Note that this requires -you to set cgi.logfile to the name of a world-writable file before the -first call to log() is made! - -Good luck! - - -Common problems and solutions ------------------------------ - -- Most HTTP servers buffer the output from CGI scripts until the -script is completed. This means that it is not possible to display a -progress report on the client's display while the script is running. - -- Check the installation instructions above. - -- Check the HTTP server's log files. ("tail -f logfile" in a separate -window may be useful!) - -- Always check a script for syntax errors first, by doing something -like "python script.py". - -- When using any of the debugging techniques, don't forget to add -"import sys" to the top of the script. - -- When invoking external programs, make sure they can be found. -Usually, this means using absolute path names -- $PATH is usually not -set to a very useful value in a CGI script. - -- When reading or writing external files, make sure they can be read -or written by every user on the system. - -- Don't try to give a CGI script a set-uid mode. This doesn't work on -most systems, and is a security liability as well. - """ -# XXX The module is getting pretty heavy with all those docstrings. -# Perhaps there should be a slimmed version that doesn't contain all those -# backwards compatible and debugging classes and functions? +# XXX Perhaps there should be a slimmed version that doesn't contain +# all those backwards compatible and debugging classes and functions? # History # ------- @@ -592,7 +196,7 @@ def parse_qsl(qs, keep_blank_values=0, strict_parsing=0): name_value_pairs = string.splitfields(qs, '&') r=[] for name_value in name_value_pairs: - nv = string.splitfields(name_value, '=') + nv = string.splitfields(name_value, '=', 1) if len(nv) != 2: if strict_parsing: raise ValueError, "bad query field: %s" % `name_value` |