diff options
Diffstat (limited to 'Doc/library/cgi.rst')
-rw-r--r-- | Doc/library/cgi.rst | 558 |
1 files changed, 558 insertions, 0 deletions
diff --git a/Doc/library/cgi.rst b/Doc/library/cgi.rst new file mode 100644 index 0000000..29ed545 --- /dev/null +++ b/Doc/library/cgi.rst @@ -0,0 +1,558 @@ + +:mod:`cgi` --- Common Gateway Interface support. +================================================ + +.. module:: cgi + :synopsis: Helpers for running Python scripts via the Common Gateway Interface. + + +.. index:: + pair: WWW; server + pair: CGI; protocol + pair: HTTP; protocol + pair: MIME; headers + single: URL + single: Common Gateway Interface + +Support module for Common Gateway Interface (CGI) scripts. + +This module defines a number of utilities for use by CGI scripts written in +Python. + + +Introduction +------------ + +.. _cgi-intro: + +A CGI script is invoked by an HTTP server, usually to process user input +submitted through an HTML ``<FORM>`` or ``<ISINDEX>`` element. + +Most often, CGI scripts live in the server's special :file:`cgi-bin` directory. +The HTTP server places all sorts of information about the request (such as the +client's hostname, the requested URL, the query string, and lots of other +goodies) in the script's shell environment, executes the script, and sends the +script's output back to the client. + +The script's input is connected to the client too, and sometimes the form data +is read this way; at other times the form data is passed via the "query string" +part of the URL. This module is intended to take care of the different cases +and provide a simpler interface to the Python script. It also provides a number +of utilities that help in debugging scripts, and the latest addition is support +for file uploads from a form (if your browser supports it). + +The output of a CGI script should consist of two sections, separated by a blank +line. The first section contains a number of headers, telling the client what +kind of data is following. Python code to generate a minimal header section +looks like this:: + + print "Content-Type: text/html" # HTML is following + print # blank line, end of headers + +The second section is usually HTML, which allows the client software to display +nicely formatted text with header, in-line images, etc. Here's Python code that +prints a simple piece of HTML:: + + print "<TITLE>CGI script output</TITLE>" + print "<H1>This is my first CGI script</H1>" + print "Hello, world!" + + +.. _using-the-cgi-module: + +Using the cgi module +-------------------- + +Begin by writing ``import cgi``. Do not use ``from cgi import *`` --- the +module defines all sorts of names for its own use or for backward compatibility +that you don't want in your namespace. + +When you write a new script, consider adding the line:: + + import cgitb; cgitb.enable() + +This activates a special exception handler that will display detailed reports in +the Web browser if any errors occur. If you'd rather not show the guts of your +program to users of your script, you can have the reports saved to files +instead, with a line like this:: + + import cgitb; cgitb.enable(display=0, logdir="/tmp") + +It's very helpful to use this feature during script development. The reports +produced by :mod:`cgitb` provide information that can save you a lot of time in +tracking down bugs. You can always remove the ``cgitb`` line later when you +have tested your script and are confident that it works correctly. + +To get at submitted form data, it's best to use the :class:`FieldStorage` class. +The other classes defined in this module are provided mostly for backward +compatibility. Instantiate it exactly once, without arguments. This reads the +form contents from standard input or the environment (depending on the value of +various environment variables set according to the CGI standard). Since it may +consume standard input, it should be instantiated only once. + +The :class:`FieldStorage` instance can be indexed like a Python dictionary, and +also supports the standard dictionary methods :meth:`has_key` and :meth:`keys`. +The built-in :func:`len` is also supported. Form fields containing empty +strings are ignored and do not appear in the dictionary; to keep such values, +provide a true value for the optional *keep_blank_values* keyword parameter when +creating the :class:`FieldStorage` instance. + +For instance, the following code (which assumes that the +:mailheader:`Content-Type` header and blank line have already been printed) +checks that the fields ``name`` and ``addr`` are both set to a non-empty +string:: + + form = cgi.FieldStorage() + if not (form.has_key("name") and form.has_key("addr")): + print "<H1>Error</H1>" + print "Please fill in the name and addr fields." + return + print "<p>name:", form["name"].value + print "<p>addr:", form["addr"].value + ...further form processing here... + +Here the fields, accessed through ``form[key]``, are themselves instances of +:class:`FieldStorage` (or :class:`MiniFieldStorage`, depending on the form +encoding). The :attr:`value` attribute of the instance yields the string value +of the field. The :meth:`getvalue` method returns this string value directly; +it also accepts an optional second argument as a default to return if the +requested key is not present. + +If the submitted form data contains more than one field with the same name, the +object retrieved by ``form[key]`` is not a :class:`FieldStorage` or +:class:`MiniFieldStorage` instance but a list of such instances. Similarly, in +this situation, ``form.getvalue(key)`` would return a list of strings. If you +expect this possibility (when your HTML form contains multiple fields with the +same name), use the :func:`getlist` function, which always returns a list of +values (so that you do not need to special-case the single item case). For +example, this code concatenates any number of username fields, separated by +commas:: + + value = form.getlist("username") + usernames = ",".join(value) + +If a field represents an uploaded file, accessing the value via the +:attr:`value` attribute or the :func:`getvalue` method reads the entire file in +memory as a string. This may not be what you want. You can test for an uploaded +file by testing either the :attr:`filename` attribute or the :attr:`file` +attribute. You can then read the data at leisure from the :attr:`file` +attribute:: + + fileitem = form["userfile"] + if fileitem.file: + # It's an uploaded file; count lines + linecount = 0 + while 1: + line = fileitem.file.readline() + if not line: break + linecount = linecount + 1 + +The file upload draft standard entertains the possibility of uploading multiple +files from one field (using a recursive :mimetype:`multipart/\*` encoding). +When this occurs, the item will be a dictionary-like :class:`FieldStorage` item. +This can be determined by testing its :attr:`type` attribute, which should be +:mimetype:`multipart/form-data` (or perhaps another MIME type matching +:mimetype:`multipart/\*`). In this case, it can be iterated over recursively +just like the top-level form object. + +When a form is submitted in the "old" format (as the query string or as a single +data part of type :mimetype:`application/x-www-form-urlencoded`), the items will +actually be instances of the class :class:`MiniFieldStorage`. In this case, the +:attr:`list`, :attr:`file`, and :attr:`filename` attributes are always ``None``. + + +Higher Level Interface +---------------------- + +.. versionadded:: 2.2 + +The previous section explains how to read CGI form data using the +:class:`FieldStorage` class. This section describes a higher level interface +which was added to this class to allow one to do it in a more readable and +intuitive way. The interface doesn't make the techniques described in previous +sections obsolete --- they are still useful to process file uploads efficiently, +for example. + +.. % XXX: Is this true ? + +The interface consists of two simple methods. Using the methods you can process +form data in a generic way, without the need to worry whether only one or more +values were posted under one name. + +In the previous section, you learned to write following code anytime you +expected a user to post more than one value under one name:: + + item = form.getvalue("item") + if isinstance(item, list): + # The user is requesting more than one item. + else: + # The user is requesting only one item. + +This situation is common for example when a form contains a group of multiple +checkboxes with the same name:: + + <input type="checkbox" name="item" value="1" /> + <input type="checkbox" name="item" value="2" /> + +In most situations, however, there's only one form control with a particular +name in a form and then you expect and need only one value associated with this +name. So you write a script containing for example this code:: + + user = form.getvalue("user").upper() + +The problem with the code is that you should never expect that a client will +provide valid input to your scripts. For example, if a curious user appends +another ``user=foo`` pair to the query string, then the script would crash, +because in this situation the ``getvalue("user")`` method call returns a list +instead of a string. Calling the :meth:`toupper` method on a list is not valid +(since lists do not have a method of this name) and results in an +:exc:`AttributeError` exception. + +Therefore, the appropriate way to read form data values was to always use the +code which checks whether the obtained value is a single value or a list of +values. That's annoying and leads to less readable scripts. + +A more convenient approach is to use the methods :meth:`getfirst` and +:meth:`getlist` provided by this higher level interface. + + +.. method:: FieldStorage.getfirst(name[, default]) + + This method always returns only one value associated with form field *name*. + The method returns only the first value in case that more values were posted + under such name. Please note that the order in which the values are received + may vary from browser to browser and should not be counted on. [#]_ If no such + form field or value exists then the method returns the value specified by the + optional parameter *default*. This parameter defaults to ``None`` if not + specified. + + +.. method:: FieldStorage.getlist(name) + + This method always returns a list of values associated with form field *name*. + The method returns an empty list if no such form field or value exists for + *name*. It returns a list consisting of one item if only one such value exists. + +Using these methods you can write nice compact code:: + + import cgi + form = cgi.FieldStorage() + user = form.getfirst("user", "").upper() # This way it's safe. + for item in form.getlist("item"): + do_something(item) + + +Old classes +----------- + +These classes, present in earlier versions of the :mod:`cgi` module, are still +supported for backward compatibility. New applications should use the +:class:`FieldStorage` class. + +:class:`SvFormContentDict` stores single value form content as dictionary; it +assumes each field name occurs in the form only once. + +:class:`FormContentDict` stores multiple value form content as a dictionary (the +form items are lists of values). Useful if your form contains multiple fields +with the same name. + +Other classes (:class:`FormContent`, :class:`InterpFormContentDict`) are present +for backwards compatibility with really old applications only. If you still use +these and would be inconvenienced when they disappeared from a next version of +this module, drop me a note. + + +.. _functions-in-cgi-module: + +Functions +--------- + +These are useful if you want more control, or if you want to employ some of the +algorithms implemented in this module in other circumstances. + + +.. function:: parse(fp[, keep_blank_values[, strict_parsing]]) + + Parse a query in the environment or from a file (the file defaults to + ``sys.stdin``). The *keep_blank_values* and *strict_parsing* parameters are + passed to :func:`parse_qs` unchanged. + + +.. function:: parse_qs(qs[, keep_blank_values[, strict_parsing]]) + + Parse a query string given as a string argument (data of type + :mimetype:`application/x-www-form-urlencoded`). Data are returned as a + dictionary. The dictionary keys are the unique query variable names and the + values are lists of values for each name. + + The optional argument *keep_blank_values* is a flag indicating whether blank + values in URL encoded queries should be treated as blank strings. A true value + indicates that blanks should be retained as blank strings. The default false + value indicates that blank values are to be ignored and treated as if they were + not included. + + The optional argument *strict_parsing* is a flag indicating what to do with + parsing errors. If false (the default), errors are silently ignored. If true, + errors raise a :exc:`ValueError` exception. + + Use the :func:`urllib.urlencode` function to convert such dictionaries into + query strings. + + +.. function:: parse_qsl(qs[, keep_blank_values[, strict_parsing]]) + + Parse a query string given as a string argument (data of type + :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of + name, value pairs. + + The optional argument *keep_blank_values* is a flag indicating whether blank + values in URL encoded queries should be treated as blank strings. A true value + indicates that blanks should be retained as blank strings. The default false + value indicates that blank values are to be ignored and treated as if they were + not included. + + The optional argument *strict_parsing* is a flag indicating what to do with + parsing errors. If false (the default), errors are silently ignored. If true, + errors raise a :exc:`ValueError` exception. + + Use the :func:`urllib.urlencode` function to convert such lists of pairs into + query strings. + + +.. function:: parse_multipart(fp, pdict) + + Parse input of type :mimetype:`multipart/form-data` (for file uploads). + Arguments are *fp* for the input file and *pdict* for a dictionary containing + other parameters in the :mailheader:`Content-Type` header. + + Returns a dictionary just like :func:`parse_qs` keys are the field names, each + value is a list of values for that field. This is easy to use but not much good + if you are expecting megabytes to be uploaded --- in that case, use the + :class:`FieldStorage` class instead which is much more flexible. + + Note that this does not parse nested multipart parts --- use + :class:`FieldStorage` for that. + + +.. function:: parse_header(string) + + Parse a MIME header (such as :mailheader:`Content-Type`) into a main value and a + dictionary of parameters. + + +.. function:: test() + + Robust test CGI script, usable as main program. Writes minimal HTTP headers and + formats all information provided to the script in HTML form. + + +.. function:: print_environ() + + Format the shell environment in HTML. + + +.. function:: print_form(form) + + Format a form in HTML. + + +.. function:: print_directory() + + Format the current directory in HTML. + + +.. function:: print_environ_usage() + + Print a list of useful (used by CGI) environment variables in HTML. + + +.. function:: escape(s[, quote]) + + Convert the characters ``'&'``, ``'<'`` and ``'>'`` in string *s* to HTML-safe + sequences. Use this if you need to display text that might contain such + characters in HTML. If the optional flag *quote* is true, the quotation mark + character (``'"'``) is also translated; this helps for inclusion in an HTML + attribute value, as in ``<A HREF="...">``. If the value to be quoted might + include single- or double-quote characters, or both, consider using the + :func:`quoteattr` function in the :mod:`xml.sax.saxutils` module instead. + + +.. _cgi-security: + +Caring about security +--------------------- + +.. index:: pair: CGI; security + +There's one important rule: if you invoke an external program (via the +:func:`os.system` or :func:`os.popen` functions. or others with similar +functionality), make very sure you don't pass arbitrary strings received from +the client to the shell. This is a well-known security hole whereby clever +hackers anywhere on the Web can exploit a gullible CGI script to invoke +arbitrary shell commands. Even parts of the URL or field names cannot be +trusted, since the request doesn't have to come from your form! + +To be on the safe side, if you must pass a string gotten from a form to a shell +command, you should make sure the string contains only alphanumeric characters, +dashes, underscores, and periods. + + +Installing your CGI script on a Unix system +------------------------------------------- + +Read the documentation for your HTTP server and check with your local system +administrator to find the directory where CGI scripts should be installed; +usually this is in a directory :file:`cgi-bin` in the server tree. + +Make sure that your script is readable and executable by "others"; the Unix file +mode should be ``0755`` octal (use ``chmod 0755 filename``). Make sure that the +first line of the script contains ``#!`` starting in column 1 followed by the +pathname of the Python interpreter, for instance:: + + #!/usr/local/bin/python + +Make sure the Python interpreter exists and is executable by "others". + +Make sure that any files your script needs to read or write are readable or +writable, respectively, by "others" --- their mode should be ``0644`` for +readable and ``0666`` for writable. This is because, for security reasons, the +HTTP server executes your script as user "nobody", without any special +privileges. It can only read (write, execute) files that everybody can read +(write, execute). The current directory at execution time is also different (it +is usually the server's cgi-bin directory) and the set of environment variables +is also different from what you get when you log in. In particular, don't count +on the shell's search path for executables (:envvar:`PATH`) or the Python module +search path (:envvar:`PYTHONPATH`) to be set to anything interesting. + +If you need to load modules from a directory which is not on Python's default +module search path, you can change the path in your script, before importing +other modules. For example:: + + import sys + sys.path.insert(0, "/usr/home/joe/lib/python") + sys.path.insert(0, "/usr/local/lib/python") + +(This way, the directory inserted last will be searched first!) + +Instructions for non-Unix systems will vary; check your HTTP server's +documentation (it will usually have a section on CGI scripts). + + +Testing your CGI script +----------------------- + +Unfortunately, a CGI script will generally not run when you try it from the +command line, and a script that works perfectly from the command line may fail +mysteriously when run from the server. There's one reason why you should still +test your script from the command line: if it contains a syntax error, the +Python interpreter won't execute it at all, and the HTTP server will most likely +send a cryptic error to the client. + +Assuming your script has no syntax errors, yet it does not work, you have no +choice but to read the next section. + + +Debugging CGI scripts +--------------------- + +.. index:: pair: CGI; debugging + +First of all, check for trivial installation errors --- reading the section +above on installing your CGI script carefully can save you a lot of time. If +you wonder whether you have understood the installation procedure correctly, try +installing a copy of this module file (:file:`cgi.py`) as a CGI script. When +invoked as a script, the file will dump its environment and the contents of the +form in HTML form. Give it the right mode etc, and send it a request. If it's +installed in the standard :file:`cgi-bin` directory, it should be possible to +send it a request by entering a URL into your browser of the form:: + + http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home + +If this gives an error of type 404, the server cannot find the script -- perhaps +you need to install it in a different directory. If it gives another error, +there's an installation problem that you should fix before trying to go any +further. If you get a nicely formatted listing of the environment and form +content (in this example, the fields should be listed as "addr" with value "At +Home" and "name" with value "Joe Blow"), the :file:`cgi.py` script has been +installed correctly. If you follow the same procedure for your own script, you +should now be able to debug it. + +The next step could be to call the :mod:`cgi` module's :func:`test` function +from your script: replace its main code with the single statement :: + + cgi.test() + +This should produce the same results as those gotten from installing the +:file:`cgi.py` file itself. + +When an ordinary Python script raises an unhandled exception (for whatever +reason: of a typo in a module name, a file that can't be opened, etc.), the +Python interpreter prints a nice traceback and exits. While the Python +interpreter will still do this when your CGI script raises an exception, most +likely the traceback will end up in one of the HTTP server's log files, or be +discarded altogether. + +Fortunately, once you have managed to get your script to execute *some* code, +you can easily send tracebacks to the Web browser using the :mod:`cgitb` module. +If you haven't done so already, just add the line:: + + import cgitb; cgitb.enable() + +to the top of your script. Then try running it again; when a problem occurs, +you should see a detailed report that will likely make apparent the cause of the +crash. + +If you suspect that there may be a problem in importing the :mod:`cgitb` module, +you can use an even more robust approach (which only uses built-in modules):: + + import sys + sys.stderr = sys.stdout + print "Content-Type: text/plain" + print + ...your code here... + +This relies on the Python interpreter to print the traceback. The content type +of the output is set to plain text, which disables all HTML processing. If your +script works, the raw HTML will be displayed by your client. If it raises an +exception, most likely after the first two lines have been printed, a traceback +will be displayed. Because no HTML interpretation is going on, the traceback +will be readable. + + +Common problems and solutions +----------------------------- + +* Most HTTP servers buffer the output from CGI scripts until the script is + completed. This means that it is not possible to display a progress report on + the client's display while the script is running. + +* Check the installation instructions above. + +* Check the HTTP server's log files. (``tail -f logfile`` in a separate window + may be useful!) + +* Always check a script for syntax errors first, by doing something like + ``python script.py``. + +* If your script does not have any syntax errors, try adding ``import cgitb; + cgitb.enable()`` to the top of the script. + +* When invoking external programs, make sure they can be found. Usually, this + means using absolute path names --- :envvar:`PATH` is usually not set to a very + useful value in a CGI script. + +* When reading or writing external files, make sure they can be read or written + by the userid under which your CGI script will be running: this is typically the + userid under which the web server is running, or some explicitly specified + userid for a web server's ``suexec`` feature. + +* Don't try to give a CGI script a set-uid mode. This doesn't work on most + systems, and is a security liability as well. + +.. rubric:: Footnotes + +.. [#] Note that some recent versions of the HTML specification do state what order the + field values should be supplied in, but knowing whether a request was + received from a conforming browser, or even from a browser at all, is tedious + and error-prone. + |