\section{\module{doctest} ---
         Test docstrings represent reality}

\declaremodule{standard}{doctest}
\moduleauthor{Tim Peters}{tim_one@users.sourceforge.net}
\sectionauthor{Tim Peters}{tim_one@users.sourceforge.net}
\sectionauthor{Moshe Zadka}{moshez@debian.org}

\modulesynopsis{A framework for verifying examples in docstrings.}

The \module{doctest} module searches a module's docstrings for text that looks
like an interactive Python session, then executes all such sessions to verify
they still work exactly as shown.  Here's a complete but small example:

\begin{verbatim}
"""
This is module example.

Example supplies one function, factorial.  For example,

>>> factorial(5)
120
"""

def factorial(n):
    """Return the factorial of n, an exact integer >= 0.

    If the result is small enough to fit in an int, return an int.
    Else return a long.

    >>> [factorial(n) for n in range(6)]
    [1, 1, 2, 6, 24, 120]
    >>> [factorial(long(n)) for n in range(6)]
    [1, 1, 2, 6, 24, 120]
    >>> factorial(30)
    265252859812191058636308480000000L
    >>> factorial(30L)
    265252859812191058636308480000000L
    >>> factorial(-1)
    Traceback (most recent call last):
        ...
    ValueError: n must be >= 0

    Factorials of floats are OK, but the float must be an exact integer:
    >>> factorial(30.1)
    Traceback (most recent call last):
        ...
    ValueError: n must be exact integer
    >>> factorial(30.0)
    265252859812191058636308480000000L

    It must also not be ridiculously large:
    >>> factorial(1e100)
    Traceback (most recent call last):
        ...
    OverflowError: n too large
    """

\end{verbatim}
% allow LaTeX to break here.
\begin{verbatim}

    import math
    if not n >= 0:
        raise ValueError("n must be >= 0")
    if math.floor(n) != n:
        raise ValueError("n must be exact integer")
    if n+1 == n:  # catch a value like 1e300
        raise OverflowError("n too large")
    result = 1
    factor = 2
    while factor <= n:
        try:
            result *= factor
        except OverflowError:
            result *= long(factor)
        factor += 1
    return result

def _test():
    import doctest
    return doctest.testmod()

if __name__ == "__main__":
    _test()
\end{verbatim}

If you run \file{example.py} directly from the command line,
\module{doctest} works its magic:

\begin{verbatim}
$ python example.py
$
\end{verbatim}

There's no output!  That's normal, and it means all the examples
worked.  Pass \programopt{-v} to the script, and \module{doctest}
prints a detailed log of what it's trying, and prints a summary at the
end:

\begin{verbatim}
$ python example.py -v
Trying: factorial(5)
Expecting: 120
ok
Trying: [factorial(n) for n in range(6)]
Expecting: [1, 1, 2, 6, 24, 120]
ok
Trying: [factorial(long(n)) for n in range(6)]
Expecting: [1, 1, 2, 6, 24, 120]
ok
\end{verbatim}

And so on, eventually ending with:

\begin{verbatim}
Trying: factorial(1e100)
Expecting:
    Traceback (most recent call last):
        ...
    OverflowError: n too large
ok
2 items passed all tests:
   1 tests in example
   8 tests in example.factorial
9 tests in 2 items.
9 passed and 0 failed.
Test passed.
$
\end{verbatim}

That's all you need to know to start making productive use of
\module{doctest}!  Jump in.  The following sections provide full
details.  Note that there are many examples of doctests in
the standard Python test suite and libraries.

\subsection{Simple Usage}

The simplest way to start using doctest (but not necessarily the way
you'll continue to do it) is to end each module \module{M} with:

\begin{verbatim}
def _test():
    import doctest
    return doctest.testmod()

if __name__ == "__main__":
    _test()
\end{verbatim}

\module{doctest} then examines docstrings in the module calling
\function{testmod()}.

Running the module as a script causes the examples in the docstrings
to get executed and verified:

\begin{verbatim}
python M.py
\end{verbatim}

This won't display anything unless an example fails, in which case the
failing example(s) and the cause(s) of the failure(s) are printed to stdout,
and the final line of output is
\samp{'***Test Failed*** \var{N} failures.'}, where \var{N} is the
number of examples that failed.

Run it with the \programopt{-v} switch instead:

\begin{verbatim}
python M.py -v
\end{verbatim}

and a detailed report of all examples tried is printed to standard
output, along with assorted summaries at the end.

You can force verbose mode by passing \code{verbose=True} to
\function{testmod()}, or
prohibit it by passing \code{verbose=False}.  In either of those cases,
\code{sys.argv} is not examined by \function{testmod()}.

In any case, \function{testmod()} returns a 2-tuple of ints \code{(\var{f},
\var{t})}, where \var{f} is the number of docstring examples that
failed and \var{t} is the total number of docstring examples
attempted.

\subsection{Which Docstrings Are Examined?}

The module docstring, and all function, class and method docstrings are
searched.  Objects imported into the module are not searched.

In addition, if \code{M.__test__} exists and "is true", it must be a
dict, and each entry maps a (string) name to a function object, class
object, or string.  Function and class object docstrings found from
\code{M.__test__} are searched, and strings are treated as if they
were docstrings.  In output, a key \code{K} in \code{M.__test__} appears
with name

\begin{verbatim}
<name of M>.__test__.K
\end{verbatim}

Any classes found are recursively searched similarly, to test docstrings in
their contained methods and nested classes.

\versionchanged[A "private name" concept is deprecated and no longer
                documented]{2.4}


\subsection{What's the Execution Context?}

By default, each time \function{testmod()} finds a docstring to test, it
uses a \emph{shallow copy} of \module{M}'s globals, so that running tests
doesn't change the module's real globals, and so that one test in
\module{M} can't leave behind crumbs that accidentally allow another test
to work.  This means examples can freely use any names defined at top-level
in \module{M}, and names defined earlier in the docstring being run.
Examples cannot see names defined in other docstrings.

You can force use of your own dict as the execution context by passing
\code{globs=your_dict} to \function{testmod()} instead.

\subsection{What About Exceptions?}

No problem, provided that the traceback is the only output produced by
the example:  just paste in the traceback.  Since tracebacks contain
details that are likely to change rapidly (for example, exact file paths
and line numbers), this is one case where doctest works hard to be
flexible in what it accepts.

Simple example:

\begin{verbatim}
>>> [1, 2, 3].remove(42)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: list.remove(x): x not in list
\end{verbatim}

That doctest succeeds if \exception{ValueError} is raised, with the
\samp{list.remove(x): x not in list} detail as shown.

The expected output for an exception must start with a traceback
header, which may be either of the following two lines, indented the
same as the first line of the example:

\begin{verbatim}
Traceback (most recent call last):
Traceback (innermost last):
\end{verbatim}

The traceback header is followed by an optional traceback stack, whose
contents are ignored by doctest.  The traceback stack is typically
omitted, or copied verbatim from an interactive session.

The traceback stack is followed by the most interesting part:  the
line(s) containing the exception type and detail.  This is usually the
last line of a traceback, but can extend across multiple lines if the
exception has a multi-line detail:

\begin{verbatim}
>>> raise ValueError('multi\n   line\ndetail')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: multi
    line
detail
\end{verbatim}

The last three (starting with \exception{ValueError}) lines are
compared against the exception's type and detail, and the rest are
ignored.

Best practice is to omit the traceback stack, unless it adds
significant documentation value to the example.  So the last example
is probably better as:

\begin{verbatim}
>>> raise ValueError('multi\n   line\ndetail')
Traceback (most recent call last):
    ...
ValueError: multi
    line
detail
\end{verbatim}

Note that tracebacks are treated very specially.  In particular, in the
rewritten example, the use of \samp{...} is independent of doctest's
\constant{ELLIPSIS} option.  The ellipsis in that example could be left
out, or could just as well be three (or three hundred) commas or digits,
or an indented transcript of a Monty Python skit.

Some details you should read once, but won't need to remember:

\begin{itemize}

\item Doctest can't guess whether your expected output came from an
  exception traceback or from ordinary printing.  So, e.g., an example
  that expects \samp{ValueError: 42 is prime} will pass whether
  \exception{ValueError} is actually raised or if the example merely
  prints that traceback text.  In practice, ordinary output rarely begins
  with a traceback header line, so this doesn't create real problems.

\item Each line of the traceback stack (if present) must be indented
  further than the first line of the example, \emph{or} start with a
  non-alphanumeric character.  The first line following the traceback
  header indented the same and starting with an alphanumeric is taken
  to be the start of the exception detail.  Of course this does the
  right thing for genuine tracebacks.

\item When the \constant{IGNORE_EXCEPTION_DETAIL} doctest option is
  is specified, everything following the leftmost colon is ignored.

\end{itemize}

\versionchanged[The ability to handle a multi-line exception detail
                was added]{2.4}


\subsection{Option Flags and Directives\label{doctest-options}}

A number of option flags control various aspects of doctest's
behavior.  Symbolic names for the flags are supplied as module constants,
which can be or'ed together and passed to various functions.  The names
can also be used in doctest directives (see below).

The first group of options define test semantics, controlling
aspects of how doctest decides whether actual output matches an
example's expected output:

\begin{datadesc}{DONT_ACCEPT_TRUE_FOR_1}
    By default, if an expected output block contains just \code{1},
    an actual output block containing just \code{1} or just
    \code{True} is considered to be a match, and similarly for \code{0}
    versus \code{False}.  When \constant{DONT_ACCEPT_TRUE_FOR_1} is
    specified, neither substitution is allowed.  The default behavior
    caters to that Python changed the return type of many functions
    from integer to boolean; doctests expecting "little integer"
    output still work in these cases.  This option will probably go
    away, but not for several years.
\end{datadesc}

\begin{datadesc}{DONT_ACCEPT_BLANKLINE}
    By default, if an expected output block contains a line
    containing only the string \code{<BLANKLINE>}, then that line
    will match a blank line in the actual output.  Because a
    genuinely blank line delimits the expected output, this is
    the only way to communicate that a blank line is expected.  When
    \constant{DONT_ACCEPT_BLANKLINE} is specified, this substitution
    is not allowed.
\end{datadesc}

\begin{datadesc}{NORMALIZE_WHITESPACE}
    When specified, all sequences of whitespace (blanks and newlines) are
    treated as equal.  Any sequence of whitespace within the expected
    output will match any sequence of whitespace within the actual output.
    By default, whitespace must match exactly.
    \constant{NORMALIZE_WHITESPACE} is especially useful when a line
    of expected output is very long, and you want to wrap it across
    multiple lines in your source.
\end{datadesc}

\begin{datadesc}{ELLIPSIS}
    When specified, an ellipsis marker (\code{...}) in the expected output
    can match any substring in the actual output.  This includes
    substrings that span line boundaries, and empty substrings, so it's
    best to keep usage of this simple.  Complicated uses can lead to the
    same kinds of "oops, it matched too much!" surprises that \regexp{.*}
    is prone to in regular expressions.
\end{datadesc}

\begin{datadesc}{IGNORE_EXCEPTION_DETAIL}
    When specified, an example that expects an exception passes if
    an exception of the expected type is raised, even if the exception
    detail does not match.  For example, an example expecting
    \samp{ValueError: 42} will pass if the actual exception raised is
    \samp{ValueError: 3*14}, but will fail, e.g., if
    \exception{TypeError} is raised.

    Note that a similar effect can be obtained using \constant{ELLIPSIS},
    and \constant{IGNORE_EXCEPTION_DETAIL} may go away when Python releases
    prior to 2.4 become uninteresting.  Until then,
    \constant{IGNORE_EXCEPTION_DETAIL} is the only clear way to write a
    doctest that doesn't care about the exception detail yet continues
    to pass under Python releases prior to 2.4 (doctest directives
    appear to be comments to them).  For example,

\begin{verbatim}
>>> (1, 2)[3] = 'moo' #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: object doesn't support item assignment
\end{verbatim}

    passes under Python 2.4 and Python 2.3.  The detail changed in 2.4,
    to say "does not" instead of "doesn't".

\end{datadesc}

\begin{datadesc}{COMPARISON_FLAGS}
    A bitmask or'ing together all the comparison flags above.
\end{datadesc}

The second group of options controls how test failures are reported:

\begin{datadesc}{REPORT_UDIFF}
    When specified, failures that involve multi-line expected and
    actual outputs are displayed using a unified diff.
\end{datadesc}

\begin{datadesc}{REPORT_CDIFF}
    When specified, failures that involve multi-line expected and
    actual outputs will be displayed using a context diff.
\end{datadesc}

\begin{datadesc}{REPORT_NDIFF}
    When specified, differences are computed by \code{difflib.Differ},
    using the same algorithm as the popular \file{ndiff.py} utility.
    This is the only method that marks differences within lines as
    well as across lines.  For example, if a line of expected output
    contains digit \code{1} where actual output contains letter \code{l},
    a line is inserted with a caret marking the mismatching column
    positions.
\end{datadesc}

\begin{datadesc}{REPORT_ONLY_FIRST_FAILURE}
  When specified, display the first failing example in each doctest,
  but suppress output for all remaining examples.  This will prevent
  doctest from reporting correct examples that break because of
  earlier failures; but it might also hide incorrect examples that
  fail independently of the first failure.  When
  \constant{REPORT_ONLY_FIRST_FAILURE} is specified, the remaining
  examples are still run, and still count towards the total number of
  failures reported; only the output is suppressed.
\end{datadesc}

\begin{datadesc}{REPORTING_FLAGS}
    A bitmask or'ing together all the reporting flags above.
\end{datadesc}

A "doctest directive" is a trailing Python comment on a line of a doctest
example:

\begin{productionlist}[doctest]
    \production{directive}
               {"\#" "doctest:" \token{on_or_off} \token{directive_name}}
    \production{on_or_off}
               {"+" | "-"}
    \production{directive_name}
               {"DONT_ACCEPT_BLANKLINE" | "NORMALIZE_WHITESPACE" | ...}
\end{productionlist}

Whitespace is not allowed between the \code{+} or \code{-} and the
directive name.  The directive name can be any of the option names
explained above.

The doctest directives appearing in a single example modify doctest's
behavior for that single example.  Use \code{+} to enable the named
behavior, or \code{-} to disable it.

For example, this test passes:

\begin{verbatim}
>>> print range(20) #doctest: +NORMALIZE_WHITESPACE
[0,   1,  2,  3,  4,  5,  6,  7,  8,  9,
10,  11, 12, 13, 14, 15, 16, 17, 18, 19]
\end{verbatim}

Without the directive it would fail, both because the actual output
doesn't have two blanks before the single-digit list elements, and
because the actual output is on a single line.  This test also passes,
and also requires a directive to do so:

\begin{verbatim}
>>> print range(20) # doctest:+ELLIPSIS
[0, 1, ..., 18, 19]
\end{verbatim}

Only one directive per physical line is accepted.  If you want to
use multiple directives for a single example, you can add
\samp{...} lines to your example containing only directives:

\begin{verbatim}
>>> print range(20) #doctest: +ELLIPSIS
...                 #doctest: +NORMALIZE_WHITESPACE
[0,    1, ...,   18,    19]
\end{verbatim}

Note that since all options are disabled by default, and directives apply
only to the example they appear in, enabling options (via \code{+} in a
directive) is usually the only meaningful choice.  However, option flags
can also be passed to functions that run doctests, establishing different
defaults.  In such cases, disabling an option via \code{-} in a directive
can be useful.

\versionchanged[Constants \constant{DONT_ACCEPT_BLANKLINE},
    \constant{NORMALIZE_WHITESPACE}, \constant{ELLIPSIS},
    \constant{IGNORE_EXCEPTION_DETAIL},
    \constant{REPORT_UDIFF}, \constant{REPORT_CDIFF},
    \constant{REPORT_NDIFF}, \constant{REPORT_ONLY_FIRST_FAILURE},
    \constant{COMPARISON_FLAGS} and \constant{REPORTING_FLAGS}
    were added; by default \code{<BLANKLINE>} in expected output
    matches an empty line in actual output; and doctest directives
    were added]{2.4}


\subsection{Advanced Usage}

Several module level functions are available for controlling how doctests
are run.

\begin{funcdesc}{debug}{module, name}
  Debug a single docstring containing doctests.

  Provide the \var{module} (or dotted name of the module) containing the
  docstring to be debugged and the \var{name} (within the module) of the
  object with the docstring to be debugged.

  The doctest examples are extracted (see function \function{testsource()}),
  and written to a temporary file.  The Python debugger, \refmodule{pdb},
  is then invoked on that file.
  \versionadded{2.3}
\end{funcdesc}

\begin{funcdesc}{testmod}{\optional{m}\optional{, name}\optional{,
                          globs}\optional{, verbose}\optional{,
                          isprivate}\optional{, report}\optional{,
                          optionflags}\optional{, extraglobs}\optional{,
                          raise_on_error}\optional{, exclude_empty}}

  All arguments are optional, and all except for \var{m} should be
  specified in keyword form.

  Test examples in docstrings in functions and classes reachable
  from module \var{m} (or the current module if \var{m} is not supplied
  or is \code{None}), starting with \code{\var{m}.__doc__}.

  Also test examples reachable from dict \code{\var{m}.__test__}, if it
  exists and is not \code{None}.  \code{\var{m}.__test__} maps
  names (strings) to functions, classes and strings; function and class
  docstrings are searched for examples; strings are searched directly,
  as if they were docstrings.

  Only docstrings attached to objects belonging to module \var{m} are
  searched.

  Return \samp{(\var{failure_count}, \var{test_count})}.

  Optional argument \var{name} gives the name of the module; by default,
  or if \code{None}, \code{\var{m}.__name__} is used.

  Optional argument \var{globs} gives a dict to be used as the globals
  when executing examples; by default, or if \code{None},
  \code{\var{m}.__dict__} is used.  A new shallow copy of this dict is
  created for each docstring with examples, so that each docstring's
  examples start with a clean slate.

  Optional argument \var{extraglobs} gives a dict merged into the
  globals used to execute examples.  This works like
  \method{dict.update()}:  if \var{globs} and \var{extraglobs} have a
  common key, the associated value in \var{extraglobs} appears in the
  combined dict.  By default, or if \code{None}, no extra globals are
  used.  This is an advanced feature that allows parameterization of
  doctests.  For example, a doctest can be written for a base class, using
  a generic name for the class, then reused to test any number of
  subclasses by passing an \var{extraglobs} dict mapping the generic
  name to the subclass to be tested.

  Optional argument \var{verbose} prints lots of stuff if true, and prints
  only failures if false; by default, or if \code{None}, it's true
  if and only if \code{'-v'} is in \code{sys.argv}.

  Optional argument \var{report} prints a summary at the end when true,
  else prints nothing at the end.  In verbose mode, the summary is
  detailed, else the summary is very brief (in fact, empty if all tests
  passed).

  Optional argument \var{optionflags} or's together option flags.  See
  see section \ref{doctest-options}.

  Optional argument \var{raise_on_error} defaults to false.  If true,
  an exception is raised upon the first failure or unexpected exception
  in an example.  This allows failures to be post-mortem debugged.
  Default behavior is to continue running examples.

  Optional argument \var{exclude_empty} defaults to false.  If true,
  objects for which no doctests are found are excluded from consideration.
  The default is a backward compatibility hack, so that code still
  using \method{doctest.master.summarize()} in conjunction with
  \function{testmod()} continues to get output for objects with no tests.
  The \var{exclude_empty} argument to the newer \class{DocTestFinder}
  constructor defaults to true.

  Optional argument \var{isprivate} specifies a function used to
  determine whether a name is private.  The default function treats
  all names as public.  \var{isprivate} can be set to
  \code{doctest.is_private} to skip over names that are
  private according to Python's underscore naming convention.
  \deprecated{2.4}{\var{isprivate} was a stupid idea -- don't use it.
  If you need to skip tests based on name, filter the list returned by
  \code{DocTestFinder.find()} instead.}

  \versionchanged[The parameter \var{optionflags} was added]{2.3}

  \versionchanged[The parameters \var{extraglobs}, \var{raise_on_error}
                  and \var{exclude_empty} were added]{2.4}
\end{funcdesc}

\begin{funcdesc}{testsource}{module, name}
  Extract the doctest examples from a docstring.

  Provide the \var{module} (or dotted name of the module) containing the
  tests to be extracted and the \var{name} (within the module) of the object
  with the docstring containing the tests to be extracted.

  The doctest examples are returned as a string containing Python
  code.  The expected output blocks in the examples are converted
  to Python comments.
  \versionadded{2.3}
\end{funcdesc}

\begin{funcdesc}{DocTestSuite}{\optional{module}}
  Convert doctest tests for a module to a
  \class{\refmodule{unittest}.TestSuite}.

  The returned \class{TestSuite} is to be run by the unittest framework
  and runs each doctest in the module.  If any of the doctests fail,
  then the synthesized unit test fails, and a \exception{DocTestTestFailure}
  exception is raised showing the name of the file containing the test and a
  (sometimes approximate) line number.

  The optional \var{module} argument provides the module to be tested.  It
  can be a module object or a (possibly dotted) module name.  If not
  specified, the module calling this function is used.

  Example using one of the many ways that the \refmodule{unittest} module
  can use a \class{TestSuite}:

  \begin{verbatim}
    import unittest
    import doctest
    import my_module_with_doctests

    suite = doctest.DocTestSuite(my_module_with_doctests)
    runner = unittest.TextTestRunner()
    runner.run(suite)
  \end{verbatim}

  \versionadded{2.3}
  \warning{This function does not currently search \code{M.__test__}
  and its search technique does not exactly match \function{testmod()} in
  every detail.  Future versions will bring the two into convergence.}
\end{funcdesc}


\subsection{How are Docstring Examples Recognized?}

In most cases a copy-and-paste of an interactive console session works
fine, but doctest isn't trying to do an exact emulation of any specific
Python shell.  All hard tab characters are expanded to spaces, using
8-column tab stops.  If you don't believe tabs should mean that, too
bad:  don't use hard tabs, or write your own \class{DocTestParser}
class.

\versionchanged[Expanding tabs to spaces is new; previous versions
                tried to preserve hard tabs, with confusing results]{2.4}

\begin{verbatim}
>>> # comments are ignored
>>> x = 12
>>> x
12
>>> if x == 13:
...     print "yes"
... else:
...     print "no"
...     print "NO"
...     print "NO!!!"
...
no
NO
NO!!!
>>>
\end{verbatim}

Any expected output must immediately follow the final
\code{'>\code{>}>~'} or \code{'...~'} line containing the code, and
the expected output (if any) extends to the next \code{'>\code{>}>~'}
or all-whitespace line.

The fine print:

\begin{itemize}

\item Expected output cannot contain an all-whitespace line, since such a
  line is taken to signal the end of expected output.  If expected
  output does contain a blank line, put \code{<BLANKLINE>} in your
  doctest example each place a blank line is expected.
  \versionchanged[\code{<BLANKLINE>} was added; there was no way to
                  use expected output containing empty lines in
                  previous versions]{2.4}

\item Output to stdout is captured, but not output to stderr (exception
  tracebacks are captured via a different means).

\item If you continue a line via backslashing in an interactive session,
  or for any other reason use a backslash, you should use a raw
  docstring, which will preserve your backslahses exactly as you type
  them:

\begin{verbatim}
>>> def f(x):
...     r'''Backslashes in a raw docstring: m\n'''
>>> print f.__doc__
Backslashes in a raw docstring: m\n
\end{verbatim}

  Otherwise, the backslash will be interpreted as part of the string.
  E.g., the "{\textbackslash}" above would be interpreted as a newline
  character.  Alternatively, you can double each backslash in the
  doctest version (and not use a raw string):

\begin{verbatim}
>>> def f(x):
...     '''Backslashes in a raw docstring: m\\n'''
>>> print f.__doc__
Backslashes in a raw docstring: m\n
\end{verbatim}

\item The starting column doesn't matter:

\begin{verbatim}
  >>> assert "Easy!"
        >>> import math
            >>> math.floor(1.9)
            1.0
\end{verbatim}

and as many leading whitespace characters are stripped from the
expected output as appeared in the initial \code{'>\code{>}>~'} line
that started the example.
\end{itemize}

\subsection{Warnings}

\begin{enumerate}

\item \module{doctest} is serious about requiring exact matches in expected
  output.  If even a single character doesn't match, the test fails.  This
  will probably surprise you a few times, as you learn exactly what Python
  does and doesn't guarantee about output.  For example, when printing a
  dict, Python doesn't guarantee that the key-value pairs will be printed
  in any particular order, so a test like

% Hey! What happened to Monty Python examples?
% Tim: ask Guido -- it's his example!
\begin{verbatim}
>>> foo()
{"Hermione": "hippogryph", "Harry": "broomstick"}
>>>
\end{verbatim}

is vulnerable!  One workaround is to do

\begin{verbatim}
>>> foo() == {"Hermione": "hippogryph", "Harry": "broomstick"}
True
>>>
\end{verbatim}

instead.  Another is to do

\begin{verbatim}
>>> d = foo().items()
>>> d.sort()
>>> d
[('Harry', 'broomstick'), ('Hermione', 'hippogryph')]
\end{verbatim}

There are others, but you get the idea.

Another bad idea is to print things that embed an object address, like

\begin{verbatim}
>>> id(1.0) # certain to fail some of the time
7948648
>>>
\end{verbatim}

Floating-point numbers are also subject to small output variations across
platforms, because Python defers to the platform C library for float
formatting, and C libraries vary widely in quality here.

\begin{verbatim}
>>> 1./7  # risky
0.14285714285714285
>>> print 1./7 # safer
0.142857142857
>>> print round(1./7, 6) # much safer
0.142857
\end{verbatim}

Numbers of the form \code{I/2.**J} are safe across all platforms, and I
often contrive doctest examples to produce numbers of that form:

\begin{verbatim}
>>> 3./4  # utterly safe
0.75
\end{verbatim}

Simple fractions are also easier for people to understand, and that makes
for better documentation.

\item Be careful if you have code that must only execute once.

If you have module-level code that must only execute once, a more foolproof
definition of \function{_test()} is

\begin{verbatim}
def _test():
    import doctest, sys
    doctest.testmod()
\end{verbatim}

\item WYSIWYG isn't always the case, starting in Python 2.3.  The
  string form of boolean results changed from \code{'0'} and
  \code{'1'} to \code{'False'} and \code{'True'} in Python 2.3.
  This makes it clumsy to write a doctest showing boolean results that
  passes under multiple versions of Python.  In Python 2.3, by default,
  and as a special case, if an expected output block consists solely
  of \code{'0'} and the actual output block consists solely of
  \code{'False'}, that's accepted as an exact match, and similarly for
  \code{'1'} versus \code{'True'}.  This behavior can be turned off by
  passing the new (in 2.3) module constant
  \constant{DONT_ACCEPT_TRUE_FOR_1} as the value of \function{testmod()}'s
  new (in 2.3) optional \var{optionflags} argument.  Some years after
  the integer spellings of booleans are history, this hack will
  probably be removed again.

\end{enumerate}


\subsection{Soapbox}

The first word in ``doctest'' is ``doc,'' and that's why the author
wrote \refmodule{doctest}: to keep documentation up to date.  It so
happens that \refmodule{doctest} makes a pleasant unit testing
environment, but that's not its primary purpose.

Choose docstring examples with care.  There's an art to this that
needs to be learned---it may not be natural at first.  Examples should
add genuine value to the documentation.  A good example can often be
worth many words.  If possible, show just a few normal cases, show
endcases, show interesting subtle cases, and show an example of each
kind of exception that can be raised.  You're probably testing for
endcases and subtle cases anyway in an interactive shell:
\refmodule{doctest} wants to make it as easy as possible to capture
those sessions, and will verify they continue to work as designed
forever after.

If done with care, the examples will be invaluable for your users, and
will pay back the time it takes to collect them many times over as the
years go by and things change.  I'm still amazed at how often one of
my \refmodule{doctest} examples stops working after a ``harmless''
change.

For exhaustive testing, or testing boring cases that add no value to the
docs, define a \code{__test__} dict instead.  That's what it's for.