diff options
Diffstat (limited to 'Doc/lib/libshlex.tex')
-rw-r--r-- | Doc/lib/libshlex.tex | 279 |
1 files changed, 0 insertions, 279 deletions
diff --git a/Doc/lib/libshlex.tex b/Doc/lib/libshlex.tex deleted file mode 100644 index 230ae9f..0000000 --- a/Doc/lib/libshlex.tex +++ /dev/null @@ -1,279 +0,0 @@ -\section{\module{shlex} --- - Simple lexical analysis} - -\declaremodule{standard}{shlex} -\modulesynopsis{Simple lexical analysis for \UNIX\ shell-like languages.} -\moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com} -\moduleauthor{Gustavo Niemeyer}{niemeyer@conectiva.com} -\sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com} -\sectionauthor{Gustavo Niemeyer}{niemeyer@conectiva.com} - -\versionadded{1.5.2} - -The \class{shlex} class makes it easy to write lexical analyzers for -simple syntaxes resembling that of the \UNIX{} shell. This will often -be useful for writing minilanguages, (for example, in run control -files for Python applications) or for parsing quoted strings. - -\note{The \module{shlex} module currently does not support Unicode input.} - -The \module{shlex} module defines the following functions: - -\begin{funcdesc}{split}{s\optional{, comments\optional{, posix}}} -Split the string \var{s} using shell-like syntax. If \var{comments} is -\constant{False} (the default), the parsing of comments in the given -string will be disabled (setting the \member{commenters} member of the -\class{shlex} instance to the empty string). This function operates -in \POSIX{} mode by default, but uses non-\POSIX{} mode if the -\var{posix} argument is false. -\versionadded{2.3} -\versionchanged[Added the \var{posix} parameter]{2.6} -\note{Since the \function{split()} function instantiates a \class{shlex} - instance, passing \code{None} for \var{s} will read the string - to split from standard input.} -\end{funcdesc} - -The \module{shlex} module defines the following class: - -\begin{classdesc}{shlex}{\optional{instream\optional{, - infile\optional{, posix}}}} -A \class{shlex} instance or subclass instance is a lexical analyzer -object. The initialization argument, if present, specifies where to -read characters from. It must be a file-/stream-like object with -\method{read()} and \method{readline()} methods, or a string (strings -are accepted since Python 2.3). If no argument is given, input will be -taken from \code{sys.stdin}. The second optional argument is a filename -string, which sets the initial value of the \member{infile} member. If -the \var{instream} argument is omitted or equal to \code{sys.stdin}, -this second argument defaults to ``stdin''. The \var{posix} argument -was introduced in Python 2.3, and defines the operational mode. When -\var{posix} is not true (default), the \class{shlex} instance will -operate in compatibility mode. When operating in \POSIX{} mode, -\class{shlex} will try to be as close as possible to the \POSIX{} shell -parsing rules. See section~\ref{shlex-objects}. -\end{classdesc} - -\begin{seealso} - \seemodule{ConfigParser}{Parser for configuration files similar to the - Windows \file{.ini} files.} -\end{seealso} - - -\subsection{shlex Objects \label{shlex-objects}} - -A \class{shlex} instance has the following methods: - -\begin{methoddesc}[shlex]{get_token}{} -Return a token. If tokens have been stacked using -\method{push_token()}, pop a token off the stack. Otherwise, read one -from the input stream. If reading encounters an immediate -end-of-file, \member{self.eof} is returned (the empty string (\code{''}) -in non-\POSIX{} mode, and \code{None} in \POSIX{} mode). -\end{methoddesc} - -\begin{methoddesc}[shlex]{push_token}{str} -Push the argument onto the token stack. -\end{methoddesc} - -\begin{methoddesc}[shlex]{read_token}{} -Read a raw token. Ignore the pushback stack, and do not interpret source -requests. (This is not ordinarily a useful entry point, and is -documented here only for the sake of completeness.) -\end{methoddesc} - -\begin{methoddesc}[shlex]{sourcehook}{filename} -When \class{shlex} detects a source request (see -\member{source} below) this method is given the following token as -argument, and expected to return a tuple consisting of a filename and -an open file-like object. - -Normally, this method first strips any quotes off the argument. If -the result is an absolute pathname, or there was no previous source -request in effect, or the previous source was a stream -(such as \code{sys.stdin}), the result is left alone. Otherwise, if the -result is a relative pathname, the directory part of the name of the -file immediately before it on the source inclusion stack is prepended -(this behavior is like the way the C preprocessor handles -\code{\#include "file.h"}). - -The result of the manipulations is treated as a filename, and returned -as the first component of the tuple, with -\function{open()} called on it to yield the second component. (Note: -this is the reverse of the order of arguments in instance initialization!) - -This hook is exposed so that you can use it to implement directory -search paths, addition of file extensions, and other namespace hacks. -There is no corresponding `close' hook, but a shlex instance will call -the \method{close()} method of the sourced input stream when it -returns \EOF. - -For more explicit control of source stacking, use the -\method{push_source()} and \method{pop_source()} methods. -\end{methoddesc} - -\begin{methoddesc}[shlex]{push_source}{stream\optional{, filename}} -Push an input source stream onto the input stack. If the filename -argument is specified it will later be available for use in error -messages. This is the same method used internally by the -\method{sourcehook} method. -\versionadded{2.1} -\end{methoddesc} - -\begin{methoddesc}[shlex]{pop_source}{} -Pop the last-pushed input source from the input stack. -This is the same method used internally when the lexer reaches -\EOF{} on a stacked input stream. -\versionadded{2.1} -\end{methoddesc} - -\begin{methoddesc}[shlex]{error_leader}{\optional{file\optional{, line}}} -This method generates an error message leader in the format of a -\UNIX{} C compiler error label; the format is \code{'"\%s", line \%d: '}, -where the \samp{\%s} is replaced with the name of the current source -file and the \samp{\%d} with the current input line number (the -optional arguments can be used to override these). - -This convenience is provided to encourage \module{shlex} users to -generate error messages in the standard, parseable format understood -by Emacs and other \UNIX{} tools. -\end{methoddesc} - -Instances of \class{shlex} subclasses have some public instance -variables which either control lexical analysis or can be used for -debugging: - -\begin{memberdesc}[shlex]{commenters} -The string of characters that are recognized as comment beginners. -All characters from the comment beginner to end of line are ignored. -Includes just \character{\#} by default. -\end{memberdesc} - -\begin{memberdesc}[shlex]{wordchars} -The string of characters that will accumulate into multi-character -tokens. By default, includes all \ASCII{} alphanumerics and -underscore. -\end{memberdesc} - -\begin{memberdesc}[shlex]{whitespace} -Characters that will be considered whitespace and skipped. Whitespace -bounds tokens. By default, includes space, tab, linefeed and -carriage-return. -\end{memberdesc} - -\begin{memberdesc}[shlex]{escape} -Characters that will be considered as escape. This will be only used -in \POSIX{} mode, and includes just \character{\textbackslash} by default. -\versionadded{2.3} -\end{memberdesc} - -\begin{memberdesc}[shlex]{quotes} -Characters that will be considered string quotes. The token -accumulates until the same quote is encountered again (thus, different -quote types protect each other as in the shell.) By default, includes -\ASCII{} single and double quotes. -\end{memberdesc} - -\begin{memberdesc}[shlex]{escapedquotes} -Characters in \member{quotes} that will interpret escape characters -defined in \member{escape}. This is only used in \POSIX{} mode, and -includes just \character{"} by default. -\versionadded{2.3} -\end{memberdesc} - -\begin{memberdesc}[shlex]{whitespace_split} -If \code{True}, tokens will only be split in whitespaces. This is useful, for -example, for parsing command lines with \class{shlex}, getting tokens -in a similar way to shell arguments. -\versionadded{2.3} -\end{memberdesc} - -\begin{memberdesc}[shlex]{infile} -The name of the current input file, as initially set at class -instantiation time or stacked by later source requests. It may -be useful to examine this when constructing error messages. -\end{memberdesc} - -\begin{memberdesc}[shlex]{instream} -The input stream from which this \class{shlex} instance is reading -characters. -\end{memberdesc} - -\begin{memberdesc}[shlex]{source} -This member is \code{None} by default. If you assign a string to it, -that string will be recognized as a lexical-level inclusion request -similar to the \samp{source} keyword in various shells. That is, the -immediately following token will opened as a filename and input taken -from that stream until \EOF, at which point the \method{close()} -method of that stream will be called and the input source will again -become the original input stream. Source requests may be stacked any -number of levels deep. -\end{memberdesc} - -\begin{memberdesc}[shlex]{debug} -If this member is numeric and \code{1} or more, a \class{shlex} -instance will print verbose progress output on its behavior. If you -need to use this, you can read the module source code to learn the -details. -\end{memberdesc} - -\begin{memberdesc}[shlex]{lineno} -Source line number (count of newlines seen so far plus one). -\end{memberdesc} - -\begin{memberdesc}[shlex]{token} -The token buffer. It may be useful to examine this when catching -exceptions. -\end{memberdesc} - -\begin{memberdesc}[shlex]{eof} -Token used to determine end of file. This will be set to the empty -string (\code{''}), in non-\POSIX{} mode, and to \code{None} in -\POSIX{} mode. -\versionadded{2.3} -\end{memberdesc} - -\subsection{Parsing Rules\label{shlex-parsing-rules}} - -When operating in non-\POSIX{} mode, \class{shlex} will try to obey to -the following rules. - -\begin{itemize} -\item Quote characters are not recognized within words - (\code{Do"Not"Separate} is parsed as the single word - \code{Do"Not"Separate}); -\item Escape characters are not recognized; -\item Enclosing characters in quotes preserve the literal value of - all characters within the quotes; -\item Closing quotes separate words (\code{"Do"Separate} is parsed - as \code{"Do"} and \code{Separate}); -\item If \member{whitespace_split} is \code{False}, any character not - declared to be a word character, whitespace, or a quote will be - returned as a single-character token. If it is \code{True}, - \class{shlex} will only split words in whitespaces; -\item EOF is signaled with an empty string (\code{''}); -\item It's not possible to parse empty strings, even if quoted. -\end{itemize} - -When operating in \POSIX{} mode, \class{shlex} will try to obey to the -following parsing rules. - -\begin{itemize} -\item Quotes are stripped out, and do not separate words - (\code{"Do"Not"Separate"} is parsed as the single word - \code{DoNotSeparate}); -\item Non-quoted escape characters (e.g. \character{\textbackslash}) - preserve the literal value of the next character that follows; -\item Enclosing characters in quotes which are not part of - \member{escapedquotes} (e.g. \character{'}) preserve the literal - value of all characters within the quotes; -\item Enclosing characters in quotes which are part of - \member{escapedquotes} (e.g. \character{"}) preserves the literal - value of all characters within the quotes, with the exception of - the characters mentioned in \member{escape}. The escape characters - retain its special meaning only when followed by the quote in use, - or the escape character itself. Otherwise the escape character - will be considered a normal character. -\item EOF is signaled with a \constant{None} value; -\item Quoted empty strings (\code{''}) are allowed; -\end{itemize} - |