summaryrefslogtreecommitdiffstats
path: root/Doc/lib/libfileinput.tex
blob: b653429bb370a1e63ed0d09b59ab865b9c2cc944 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
\section{\module{fileinput} ---
         Iterate over lines from multiple input streams}
\declaremodule{standard}{fileinput}
\moduleauthor{Guido van Rossum}{guido@python.org}
\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}

\modulesynopsis{Perl-like iteration over lines from multiple input
streams, with ``save in place'' capability.}


This module implements a helper class and functions to quickly write a
loop over standard input or a list of files.

The typical use is:

\begin{verbatim}
import fileinput
for line in fileinput.input():
    process(line)
\end{verbatim}

This iterates over the lines of all files listed in
\code{sys.argv[1:]}, defaulting to \code{sys.stdin} if the list is
empty.  If a filename is \code{'-'}, it is also replaced by
\code{sys.stdin}.  To specify an alternative list of filenames, pass
it as the first argument to \function{input()}.  A single file name is
also allowed.

All files are opened in text mode by default, but you can override this by
specifying the \var{mode} parameter in the call to \function{input()}
or \class{FileInput()}.  If an I/O error occurs during opening or reading
a file, \exception{IOError} is raised.

If \code{sys.stdin} is used more than once, the second and further use
will return no lines, except perhaps for interactive use, or if it has
been explicitly reset (e.g. using \code{sys.stdin.seek(0)}).

Empty files are opened and immediately closed; the only time their
presence in the list of filenames is noticeable at all is when the
last file opened is empty.

It is possible that the last line of a file does not end in a newline
character; lines are returned including the trailing newline when it
is present.

You can control how files are opened by providing an opening hook via the
\var{openhook} parameter to \function{input()} or \class{FileInput()}.
The hook must be a function that takes two arguments, \var{filename}
and \var{mode}, and returns an accordingly opened file-like object.
Two useful hooks are already provided by this module.

The following function is the primary interface of this module:

\begin{funcdesc}{input}{\optional{files\optional{, inplace\optional{,
                        backup\optional{, mode\optional{, openhook}}}}}}
  Create an instance of the \class{FileInput} class.  The instance
  will be used as global state for the functions of this module, and
  is also returned to use during iteration.  The parameters to this
  function will be passed along to the constructor of the
  \class{FileInput} class.

  \versionchanged[Added the \var{mode} and \var{openhook} parameters]{2.5}
\end{funcdesc}


The following functions use the global state created by
\function{input()}; if there is no active state,
\exception{RuntimeError} is raised.

\begin{funcdesc}{filename}{}
  Return the name of the file currently being read.  Before the first
  line has been read, returns \code{None}.
\end{funcdesc}

\begin{funcdesc}{fileno}{}
  Return the integer ``file descriptor'' for the current file. When no
  file is opened (before the first line and between files), returns
  \code{-1}.
\end{funcdesc}

\begin{funcdesc}{lineno}{}
  Return the cumulative line number of the line that has just been
  read.  Before the first line has been read, returns \code{0}.  After
  the last line of the last file has been read, returns the line
  number of that line.
\end{funcdesc}

\begin{funcdesc}{filelineno}{}
  Return the line number in the current file.  Before the first line
  has been read, returns \code{0}.  After the last line of the last
  file has been read, returns the line number of that line within the
  file.
\end{funcdesc}

\begin{funcdesc}{isfirstline}{}
  Returns true if the line just read is the first line of its file,
  otherwise returns false.
\end{funcdesc}

\begin{funcdesc}{isstdin}{}
  Returns true if the last line was read from \code{sys.stdin},
  otherwise returns false.
\end{funcdesc}

\begin{funcdesc}{nextfile}{}
  Close the current file so that the next iteration will read the
  first line from the next file (if any); lines not read from the file
  will not count towards the cumulative line count.  The filename is
  not changed until after the first line of the next file has been
  read.  Before the first line has been read, this function has no
  effect; it cannot be used to skip the first file.  After the last
  line of the last file has been read, this function has no effect.
\end{funcdesc}

\begin{funcdesc}{close}{}
  Close the sequence.
\end{funcdesc}


The class which implements the sequence behavior provided by the
module is available for subclassing as well:

\begin{classdesc}{FileInput}{\optional{files\optional{,
                             inplace\optional{, backup\optional{,
                             mode\optional{, openhook}}}}}}
  Class \class{FileInput} is the implementation; its methods
  \method{filename()}, \method{fileno()}, \method{lineno()},
  \method{fileline()}, \method{isfirstline()}, \method{isstdin()},
  \method{nextfile()} and \method{close()} correspond to the functions
  of the same name in the module.
  In addition it has a \method{readline()} method which
  returns the next input line, and a \method{__getitem__()} method
  which implements the sequence behavior.  The sequence must be
  accessed in strictly sequential order; random access and
  \method{readline()} cannot be mixed.

  With \var{mode} you can specify which file mode will be passed to
  \function{open()}. It must be one of \code{'r'}, \code{'rU'},
  \code{'U'} and \code{'rb'}.

  The \var{openhook}, when given, must be a function that takes two arguments,
  \var{filename} and \var{mode}, and returns an accordingly opened
  file-like object.
  You cannot use \var{inplace} and \var{openhook} together.

  \versionchanged[Added the \var{mode} and \var{openhook} parameters]{2.5}
\end{classdesc}

\strong{Optional in-place filtering:} if the keyword argument
\code{\var{inplace}=1} is passed to \function{input()} or to the
\class{FileInput} constructor, the file is moved to a backup file and
standard output is directed to the input file (if a file of the same
name as the backup file already exists, it will be replaced silently).
This makes it possible to write a filter that rewrites its input file
in place.  If the keyword argument \code{\var{backup}='.<some
extension>'} is also given, it specifies the extension for the backup
file, and the backup file remains around; by default, the extension is
\code{'.bak'} and it is deleted when the output file is closed.  In-place
filtering is disabled when standard input is read.

\strong{Caveat:} The current implementation does not work for MS-DOS
8+3 filesystems.


The two following opening hooks are provided by this module:

\begin{funcdesc}{hook_compressed}{filename, mode}
  Transparently opens files compressed with gzip and bzip2 (recognized
  by the extensions \code{'.gz'} and \code{'.bz2'}) using the \module{gzip}
  and \module{bz2} modules as well as normal files.

  Usage example: 
  \samp{fi = fileinput.FileInput(openhook=fileinput.hook_compressed)}

  \versionadded{2.5}
\end{funcdesc}

\begin{funcdesc}{hook_encoded}{encoding}
  Returns a hook which opens each file with \function{codecs.open()},
  using the given \var{encoding} to read the file.

  Usage example:
  \samp{fi = fileinput.FileInput(openhook=fileinput.hook_encoded("iso-8859-1"))}

  \note{With this hook, \class{FileInput} might return Unicode strings
        depending on the specified \var{encoding}.}
  \versionadded{2.5}
\end{funcdesc}