diff options
Diffstat (limited to 'Doc/lib/libtarfile.tex')
-rw-r--r-- | Doc/lib/libtarfile.tex | 664 |
1 files changed, 0 insertions, 664 deletions
diff --git a/Doc/lib/libtarfile.tex b/Doc/lib/libtarfile.tex deleted file mode 100644 index 95ea051..0000000 --- a/Doc/lib/libtarfile.tex +++ /dev/null @@ -1,664 +0,0 @@ -\section{\module{tarfile} --- Read and write tar archive files} - -\declaremodule{standard}{tarfile} -\modulesynopsis{Read and write tar-format archive files.} -\versionadded{2.3} - -\moduleauthor{Lars Gust\"abel}{lars@gustaebel.de} -\sectionauthor{Lars Gust\"abel}{lars@gustaebel.de} - -The \module{tarfile} module makes it possible to read and create tar archives. -Some facts and figures: - -\begin{itemize} -\item reads and writes \module{gzip} and \module{bzip2} compressed archives. -\item read/write support for the \POSIX{}.1-1988 (ustar) format. -\item read/write support for the GNU tar format including \emph{longname} and - \emph{longlink} extensions, read-only support for the \emph{sparse} - extension. -\item read/write support for the \POSIX{}.1-2001 (pax) format. - \versionadded{2.6} -\item handles directories, regular files, hardlinks, symbolic links, fifos, - character devices and block devices and is able to acquire and - restore file information like timestamp, access permissions and owner. -\item can handle tape devices. -\end{itemize} - -\begin{funcdesc}{open}{name\optional{, mode\optional{, - fileobj\optional{, bufsize}}}, **kwargs} - Return a \class{TarFile} object for the pathname \var{name}. - For detailed information on \class{TarFile} objects and the keyword - arguments that are allowed, see \citetitle{TarFile Objects} - (section \ref{tarfile-objects}). - - \var{mode} has to be a string of the form \code{'filemode[:compression]'}, - it defaults to \code{'r'}. Here is a full list of mode combinations: - - \begin{tableii}{c|l}{code}{mode}{action} - \lineii{'r' or 'r:*'}{Open for reading with transparent compression (recommended).} - \lineii{'r:'}{Open for reading exclusively without compression.} - \lineii{'r:gz'}{Open for reading with gzip compression.} - \lineii{'r:bz2'}{Open for reading with bzip2 compression.} - \lineii{'a' or 'a:'}{Open for appending with no compression. The file - is created if it does not exist.} - \lineii{'w' or 'w:'}{Open for uncompressed writing.} - \lineii{'w:gz'}{Open for gzip compressed writing.} - \lineii{'w:bz2'}{Open for bzip2 compressed writing.} - \end{tableii} - - Note that \code{'a:gz'} or \code{'a:bz2'} is not possible. - If \var{mode} is not suitable to open a certain (compressed) file for - reading, \exception{ReadError} is raised. Use \var{mode} \code{'r'} to - avoid this. If a compression method is not supported, - \exception{CompressionError} is raised. - - If \var{fileobj} is specified, it is used as an alternative to a file - object opened for \var{name}. It is supposed to be at position 0. - - For special purposes, there is a second format for \var{mode}: - \code{'filemode|[compression]'}. \function{open()} will return a - \class{TarFile} object that processes its data as a stream of - blocks. No random seeking will be done on the file. If given, - \var{fileobj} may be any object that has a \method{read()} or - \method{write()} method (depending on the \var{mode}). - \var{bufsize} specifies the blocksize and defaults to \code{20 * - 512} bytes. Use this variant in combination with - e.g. \code{sys.stdin}, a socket file object or a tape device. - However, such a \class{TarFile} object is limited in that it does - not allow to be accessed randomly, see ``Examples'' - (section~\ref{tar-examples}). The currently possible modes: - - \begin{tableii}{c|l}{code}{Mode}{Action} - \lineii{'r|*'}{Open a \emph{stream} of tar blocks for reading with transparent compression.} - \lineii{'r|'}{Open a \emph{stream} of uncompressed tar blocks for reading.} - \lineii{'r|gz'}{Open a gzip compressed \emph{stream} for reading.} - \lineii{'r|bz2'}{Open a bzip2 compressed \emph{stream} for reading.} - \lineii{'w|'}{Open an uncompressed \emph{stream} for writing.} - \lineii{'w|gz'}{Open an gzip compressed \emph{stream} for writing.} - \lineii{'w|bz2'}{Open an bzip2 compressed \emph{stream} for writing.} - \end{tableii} -\end{funcdesc} - -\begin{classdesc*}{TarFile} - Class for reading and writing tar archives. Do not use this - class directly, better use \function{open()} instead. - See ``TarFile Objects'' (section~\ref{tarfile-objects}). -\end{classdesc*} - -\begin{funcdesc}{is_tarfile}{name} - Return \constant{True} if \var{name} is a tar archive file, that - the \module{tarfile} module can read. -\end{funcdesc} - -\begin{classdesc}{TarFileCompat}{filename\optional{, mode\optional{, - compression}}} - Class for limited access to tar archives with a - \refmodule{zipfile}-like interface. Please consult the - documentation of the \refmodule{zipfile} module for more details. - \var{compression} must be one of the following constants: - \begin{datadesc}{TAR_PLAIN} - Constant for an uncompressed tar archive. - \end{datadesc} - \begin{datadesc}{TAR_GZIPPED} - Constant for a \refmodule{gzip} compressed tar archive. - \end{datadesc} -\end{classdesc} - -\begin{excdesc}{TarError} - Base class for all \module{tarfile} exceptions. -\end{excdesc} - -\begin{excdesc}{ReadError} - Is raised when a tar archive is opened, that either cannot be handled by - the \module{tarfile} module or is somehow invalid. -\end{excdesc} - -\begin{excdesc}{CompressionError} - Is raised when a compression method is not supported or when the data - cannot be decoded properly. -\end{excdesc} - -\begin{excdesc}{StreamError} - Is raised for the limitations that are typical for stream-like - \class{TarFile} objects. -\end{excdesc} - -\begin{excdesc}{ExtractError} - Is raised for \emph{non-fatal} errors when using \method{extract()}, but - only if \member{TarFile.errorlevel}\code{ == 2}. -\end{excdesc} - -\begin{excdesc}{HeaderError} - Is raised by \method{frombuf()} if the buffer it gets is invalid. - \versionadded{2.6} -\end{excdesc} - -Each of the following constants defines a tar archive format that the -\module{tarfile} module is able to create. See section \ref{tar-formats} for -details. - -\begin{datadesc}{USTAR_FORMAT} - \POSIX{}.1-1988 (ustar) format. -\end{datadesc} - -\begin{datadesc}{GNU_FORMAT} - GNU tar format. -\end{datadesc} - -\begin{datadesc}{PAX_FORMAT} - \POSIX{}.1-2001 (pax) format. -\end{datadesc} - -\begin{datadesc}{DEFAULT_FORMAT} - The default format for creating archives. This is currently - \constant{GNU_FORMAT}. -\end{datadesc} - -\begin{seealso} - \seemodule{zipfile}{Documentation of the \refmodule{zipfile} - standard module.} - - \seetitle[http://www.gnu.org/software/tar/manual/html_node/tar_134.html\#SEC134] - {GNU tar manual, Basic Tar Format}{Documentation for tar archive files, - including GNU tar extensions.} -\end{seealso} - -%----------------- -% TarFile Objects -%----------------- - -\subsection{TarFile Objects \label{tarfile-objects}} - -The \class{TarFile} object provides an interface to a tar archive. A tar -archive is a sequence of blocks. An archive member (a stored file) is made up -of a header block followed by data blocks. It is possible to store a file in a -tar archive several times. Each archive member is represented by a -\class{TarInfo} object, see \citetitle{TarInfo Objects} (section -\ref{tarinfo-objects}) for details. - -\begin{classdesc}{TarFile}{name=None, mode='r', fileobj=None, - format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, - ignore_zeros=False, encoding=None, errors=None, pax_headers=None, - debug=0, errorlevel=0} - - All following arguments are optional and can be accessed as instance - attributes as well. - - \var{name} is the pathname of the archive. It can be omitted if - \var{fileobj} is given. In this case, the file object's \member{name} - attribute is used if it exists. - - \var{mode} is either \code{'r'} to read from an existing archive, - \code{'a'} to append data to an existing file or \code{'w'} to create a new - file overwriting an existing one. - - If \var{fileobj} is given, it is used for reading or writing data. - If it can be determined, \var{mode} is overridden by \var{fileobj}'s mode. - \var{fileobj} will be used from position 0. - \begin{notice} - \var{fileobj} is not closed, when \class{TarFile} is closed. - \end{notice} - - \var{format} controls the archive format. It must be one of the constants - \constant{USTAR_FORMAT}, \constant{GNU_FORMAT} or \constant{PAX_FORMAT} - that are defined at module level. - \versionadded{2.6} - - The \var{tarinfo} argument can be used to replace the default - \class{TarInfo} class with a different one. - \versionadded{2.6} - - If \var{dereference} is \code{False}, add symbolic and hard links to the - archive. If it is \code{True}, add the content of the target files to the - archive. This has no effect on systems that do not support symbolic links. - - If \var{ignore_zeros} is \code{False}, treat an empty block as the end of - the archive. If it is \var{True}, skip empty (and invalid) blocks and try - to get as many members as possible. This is only useful for reading - concatenated or damaged archives. - - \var{debug} can be set from \code{0} (no debug messages) up to \code{3} - (all debug messages). The messages are written to \code{sys.stderr}. - - If \var{errorlevel} is \code{0}, all errors are ignored when using - \method{extract()}. Nevertheless, they appear as error messages in the - debug output, when debugging is enabled. If \code{1}, all \emph{fatal} - errors are raised as \exception{OSError} or \exception{IOError} exceptions. - If \code{2}, all \emph{non-fatal} errors are raised as \exception{TarError} - exceptions as well. - - The \var{encoding} and \var{errors} arguments control the way strings are - converted to unicode objects and vice versa. The default settings will work - for most users. See section \ref{tar-unicode} for in-depth information. - \versionadded{2.6} - - The \var{pax_headers} argument is an optional dictionary of unicode strings - which will be added as a pax global header if \var{format} is - \constant{PAX_FORMAT}. - \versionadded{2.6} -\end{classdesc} - -\begin{methoddesc}{open}{...} - Alternative constructor. The \function{open()} function on module level is - actually a shortcut to this classmethod. See section~\ref{module-tarfile} - for details. -\end{methoddesc} - -\begin{methoddesc}{getmember}{name} - Return a \class{TarInfo} object for member \var{name}. If \var{name} can - not be found in the archive, \exception{KeyError} is raised. - \begin{notice} - If a member occurs more than once in the archive, its last - occurrence is assumed to be the most up-to-date version. - \end{notice} -\end{methoddesc} - -\begin{methoddesc}{getmembers}{} - Return the members of the archive as a list of \class{TarInfo} objects. - The list has the same order as the members in the archive. -\end{methoddesc} - -\begin{methoddesc}{getnames}{} - Return the members as a list of their names. It has the same order as - the list returned by \method{getmembers()}. -\end{methoddesc} - -\begin{methoddesc}{list}{verbose=True} - Print a table of contents to \code{sys.stdout}. If \var{verbose} is - \constant{False}, only the names of the members are printed. If it is - \constant{True}, output similar to that of \program{ls -l} is produced. -\end{methoddesc} - -\begin{methoddesc}{next}{} - Return the next member of the archive as a \class{TarInfo} object, when - \class{TarFile} is opened for reading. Return \code{None} if there is no - more available. -\end{methoddesc} - -\begin{methoddesc}{extractall}{\optional{path\optional{, members}}} - Extract all members from the archive to the current working directory - or directory \var{path}. If optional \var{members} is given, it must be - a subset of the list returned by \method{getmembers()}. - Directory information like owner, modification time and permissions are - set after all members have been extracted. This is done to work around two - problems: A directory's modification time is reset each time a file is - created in it. And, if a directory's permissions do not allow writing, - extracting files to it will fail. - \versionadded{2.5} -\end{methoddesc} - -\begin{methoddesc}{extract}{member\optional{, path}} - Extract a member from the archive to the current working directory, - using its full name. Its file information is extracted as accurately as - possible. - \var{member} may be a filename or a \class{TarInfo} object. - You can specify a different directory using \var{path}. - \begin{notice} - Because the \method{extract()} method allows random access to a tar - archive there are some issues you must take care of yourself. See the - description for \method{extractall()} above. - \end{notice} -\end{methoddesc} - -\begin{methoddesc}{extractfile}{member} - Extract a member from the archive as a file object. - \var{member} may be a filename or a \class{TarInfo} object. - If \var{member} is a regular file, a file-like object is returned. - If \var{member} is a link, a file-like object is constructed from the - link's target. - If \var{member} is none of the above, \code{None} is returned. - \begin{notice} - The file-like object is read-only and provides the following methods: - \method{read()}, \method{readline()}, \method{readlines()}, - \method{seek()}, \method{tell()}. - \end{notice} -\end{methoddesc} - -\begin{methoddesc}{add}{name\optional{, arcname\optional{, recursive\optional{, exclude}}}} - Add the file \var{name} to the archive. \var{name} may be any type - of file (directory, fifo, symbolic link, etc.). - If given, \var{arcname} specifies an alternative name for the file in the - archive. Directories are added recursively by default. - This can be avoided by setting \var{recursive} to \constant{False}. - If \var{exclude} is given it must be a function that takes one filename - argument and returns a boolean value. Depending on this value the - respective file is either excluded (\constant{True}) or added - (\constant{False}). - \versionchanged[Added the \var{exclude} parameter]{2.6} -\end{methoddesc} - -\begin{methoddesc}{addfile}{tarinfo\optional{, fileobj}} - Add the \class{TarInfo} object \var{tarinfo} to the archive. - If \var{fileobj} is given, \code{\var{tarinfo}.size} bytes are read - from it and added to the archive. You can create \class{TarInfo} objects - using \method{gettarinfo()}. - \begin{notice} - On Windows platforms, \var{fileobj} should always be opened with mode - \code{'rb'} to avoid irritation about the file size. - \end{notice} -\end{methoddesc} - -\begin{methoddesc}{gettarinfo}{\optional{name\optional{, - arcname\optional{, fileobj}}}} - Create a \class{TarInfo} object for either the file \var{name} or - the file object \var{fileobj} (using \function{os.fstat()} on its - file descriptor). You can modify some of the \class{TarInfo}'s - attributes before you add it using \method{addfile()}. If given, - \var{arcname} specifies an alternative name for the file in the - archive. -\end{methoddesc} - -\begin{methoddesc}{close}{} - Close the \class{TarFile}. In write mode, two finishing zero - blocks are appended to the archive. -\end{methoddesc} - -\begin{memberdesc}{posix} - Setting this to \constant{True} is equivalent to setting the - \member{format} attribute to \constant{USTAR_FORMAT}, - \constant{False} is equivalent to \constant{GNU_FORMAT}. - \versionchanged[\var{posix} defaults to \constant{False}]{2.4} - \deprecated{2.6}{Use the \member{format} attribute instead.} -\end{memberdesc} - -\begin{memberdesc}{pax_headers} - A dictionary containing key-value pairs of pax global headers. - \versionadded{2.6} -\end{memberdesc} - -%----------------- -% TarInfo Objects -%----------------- - -\subsection{TarInfo Objects \label{tarinfo-objects}} - -A \class{TarInfo} object represents one member in a -\class{TarFile}. Aside from storing all required attributes of a file -(like file type, size, time, permissions, owner etc.), it provides -some useful methods to determine its type. It does \emph{not} contain -the file's data itself. - -\class{TarInfo} objects are returned by \class{TarFile}'s methods -\method{getmember()}, \method{getmembers()} and \method{gettarinfo()}. - -\begin{classdesc}{TarInfo}{\optional{name}} - Create a \class{TarInfo} object. -\end{classdesc} - -\begin{methoddesc}{frombuf}{buf} - Create and return a \class{TarInfo} object from string buffer \var{buf}. - \versionadded[Raises \exception{HeaderError} if the buffer is - invalid.]{2.6} -\end{methoddesc} - -\begin{methoddesc}{fromtarfile}{tarfile} - Read the next member from the \class{TarFile} object \var{tarfile} and - return it as a \class{TarInfo} object. - \versionadded{2.6} -\end{methoddesc} - -\begin{methoddesc}{tobuf}{\optional{format\optional{, encoding - \optional{, errors}}}} - Create a string buffer from a \class{TarInfo} object. For information - on the arguments see the constructor of the \class{TarFile} class. - \versionchanged[The arguments were added]{2.6} -\end{methoddesc} - -A \code{TarInfo} object has the following public data attributes: - -\begin{memberdesc}{name} - Name of the archive member. -\end{memberdesc} - -\begin{memberdesc}{size} - Size in bytes. -\end{memberdesc} - -\begin{memberdesc}{mtime} - Time of last modification. -\end{memberdesc} - -\begin{memberdesc}{mode} - Permission bits. -\end{memberdesc} - -\begin{memberdesc}{type} - File type. \var{type} is usually one of these constants: - \constant{REGTYPE}, \constant{AREGTYPE}, \constant{LNKTYPE}, - \constant{SYMTYPE}, \constant{DIRTYPE}, \constant{FIFOTYPE}, - \constant{CONTTYPE}, \constant{CHRTYPE}, \constant{BLKTYPE}, - \constant{GNUTYPE_SPARSE}. To determine the type of a - \class{TarInfo} object more conveniently, use the \code{is_*()} - methods below. -\end{memberdesc} - -\begin{memberdesc}{linkname} - Name of the target file name, which is only present in - \class{TarInfo} objects of type \constant{LNKTYPE} and - \constant{SYMTYPE}. -\end{memberdesc} - -\begin{memberdesc}{uid} - User ID of the user who originally stored this member. -\end{memberdesc} - -\begin{memberdesc}{gid} - Group ID of the user who originally stored this member. -\end{memberdesc} - -\begin{memberdesc}{uname} - User name. -\end{memberdesc} - -\begin{memberdesc}{gname} - Group name. -\end{memberdesc} - -\begin{memberdesc}{pax_headers} - A dictionary containing key-value pairs of an associated pax - extended header. - \versionadded{2.6} -\end{memberdesc} - -A \class{TarInfo} object also provides some convenient query methods: - -\begin{methoddesc}{isfile}{} - Return \constant{True} if the \class{Tarinfo} object is a regular - file. -\end{methoddesc} - -\begin{methoddesc}{isreg}{} - Same as \method{isfile()}. -\end{methoddesc} - -\begin{methoddesc}{isdir}{} - Return \constant{True} if it is a directory. -\end{methoddesc} - -\begin{methoddesc}{issym}{} - Return \constant{True} if it is a symbolic link. -\end{methoddesc} - -\begin{methoddesc}{islnk}{} - Return \constant{True} if it is a hard link. -\end{methoddesc} - -\begin{methoddesc}{ischr}{} - Return \constant{True} if it is a character device. -\end{methoddesc} - -\begin{methoddesc}{isblk}{} - Return \constant{True} if it is a block device. -\end{methoddesc} - -\begin{methoddesc}{isfifo}{} - Return \constant{True} if it is a FIFO. -\end{methoddesc} - -\begin{methoddesc}{isdev}{} - Return \constant{True} if it is one of character device, block - device or FIFO. -\end{methoddesc} - -%------------------------ -% Examples -%------------------------ - -\subsection{Examples \label{tar-examples}} - -How to extract an entire tar archive to the current working directory: -\begin{verbatim} -import tarfile -tar = tarfile.open("sample.tar.gz") -tar.extractall() -tar.close() -\end{verbatim} - -How to create an uncompressed tar archive from a list of filenames: -\begin{verbatim} -import tarfile -tar = tarfile.open("sample.tar", "w") -for name in ["foo", "bar", "quux"]: - tar.add(name) -tar.close() -\end{verbatim} - -How to read a gzip compressed tar archive and display some member information: -\begin{verbatim} -import tarfile -tar = tarfile.open("sample.tar.gz", "r:gz") -for tarinfo in tar: - print tarinfo.name, "is", tarinfo.size, "bytes in size and is", - if tarinfo.isreg(): - print "a regular file." - elif tarinfo.isdir(): - print "a directory." - else: - print "something else." -tar.close() -\end{verbatim} - -How to create a tar archive with faked information: -\begin{verbatim} -import tarfile -tar = tarfile.open("sample.tar.gz", "w:gz") -for name in namelist: - tarinfo = tar.gettarinfo(name, "fakeproj-1.0/" + name) - tarinfo.uid = 123 - tarinfo.gid = 456 - tarinfo.uname = "johndoe" - tarinfo.gname = "fake" - tar.addfile(tarinfo, file(name)) -tar.close() -\end{verbatim} - -The \emph{only} way to extract an uncompressed tar stream from -\code{sys.stdin}: -\begin{verbatim} -import sys -import tarfile -tar = tarfile.open(mode="r|", fileobj=sys.stdin) -for tarinfo in tar: - tar.extract(tarinfo) -tar.close() -\end{verbatim} - -%------------ -% Tar format -%------------ - -\subsection{Supported tar formats \label{tar-formats}} - -There are three tar formats that can be created with the \module{tarfile} -module: - -\begin{itemize} - -\item -The \POSIX{}.1-1988 ustar format (\constant{USTAR_FORMAT}). It supports -filenames up to a length of at best 256 characters and linknames up to 100 -characters. The maximum file size is 8 gigabytes. This is an old and limited -but widely supported format. - -\item -The GNU tar format (\constant{GNU_FORMAT}). It supports long filenames and -linknames, files bigger than 8 gigabytes and sparse files. It is the de facto -standard on GNU/Linux systems. \module{tarfile} fully supports the GNU tar -extensions for long names, sparse file support is read-only. - -\item -The \POSIX{}.1-2001 pax format (\constant{PAX_FORMAT}). It is the most -flexible format with virtually no limits. It supports long filenames and -linknames, large files and stores pathnames in a portable way. However, not -all tar implementations today are able to handle pax archives properly. - -The \emph{pax} format is an extension to the existing \emph{ustar} format. It -uses extra headers for information that cannot be stored otherwise. There are -two flavours of pax headers: Extended headers only affect the subsequent file -header, global headers are valid for the complete archive and affect all -following files. All the data in a pax header is encoded in \emph{UTF-8} for -portability reasons. - -\end{itemize} - -There are some more variants of the tar format which can be read, but not -created: - -\begin{itemize} - -\item -The ancient V7 format. This is the first tar format from \UNIX{} Seventh -Edition, storing only regular files and directories. Names must not be longer -than 100 characters, there is no user/group name information. Some archives -have miscalculated header checksums in case of fields with non-\ASCII{} -characters. - -\item -The SunOS tar extended format. This format is a variant of the \POSIX{}.1-2001 -pax format, but is not compatible. - -\end{itemize} - -%---------------- -% Unicode issues -%---------------- - -\subsection{Unicode issues \label{tar-unicode}} - -The tar format was originally conceived to make backups on tape drives with the -main focus on preserving file system information. Nowadays tar archives are -commonly used for file distribution and exchanging archives over networks. One -problem of the original format (that all other formats are merely variants of) -is that there is no concept of supporting different character encodings. -For example, an ordinary tar archive created on a \emph{UTF-8} system cannot be -read correctly on a \emph{Latin-1} system if it contains non-\ASCII{} -characters. Names (i.e. filenames, linknames, user/group names) containing -these characters will appear damaged. Unfortunately, there is no way to -autodetect the encoding of an archive. - -The pax format was designed to solve this problem. It stores non-\ASCII{} names -using the universal character encoding \emph{UTF-8}. When a pax archive is -read, these \emph{UTF-8} names are converted to the encoding of the local -file system. - -The details of unicode conversion are controlled by the \var{encoding} and -\var{errors} keyword arguments of the \class{TarFile} class. - -The default value for \var{encoding} is the local character encoding. It is -deduced from \function{sys.getfilesystemencoding()} and -\function{sys.getdefaultencoding()}. In read mode, \var{encoding} is used -exclusively to convert unicode names from a pax archive to strings in the local -character encoding. In write mode, the use of \var{encoding} depends on the -chosen archive format. In case of \constant{PAX_FORMAT}, input names that -contain non-\ASCII{} characters need to be decoded before being stored as -\emph{UTF-8} strings. The other formats do not make use of \var{encoding} -unless unicode objects are used as input names. These are converted to -8-bit character strings before they are added to the archive. - -The \var{errors} argument defines how characters are treated that cannot be -converted to or from \var{encoding}. Possible values are listed in section -\ref{codec-base-classes}. In read mode, there is an additional scheme -\code{'utf-8'} which means that bad characters are replaced by their -\emph{UTF-8} representation. This is the default scheme. In write mode the -default value for \var{errors} is \code{'strict'} to ensure that name -information is not altered unnoticed. |