summaryrefslogtreecommitdiffstats
path: root/Utilities/cmlibarchive/libarchive/libarchive-formats.5
diff options
context:
space:
mode:
Diffstat (limited to 'Utilities/cmlibarchive/libarchive/libarchive-formats.5')
-rw-r--r--Utilities/cmlibarchive/libarchive/libarchive-formats.5464
1 files changed, 464 insertions, 0 deletions
diff --git a/Utilities/cmlibarchive/libarchive/libarchive-formats.5 b/Utilities/cmlibarchive/libarchive/libarchive-formats.5
new file mode 100644
index 0000000..5a118ff
--- /dev/null
+++ b/Utilities/cmlibarchive/libarchive/libarchive-formats.5
@@ -0,0 +1,464 @@
+.\" Copyright (c) 2003-2009 Tim Kientzle
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd December 27, 2016
+.Dt LIBARCHIVE-FORMATS 5
+.Os
+.Sh NAME
+.Nm libarchive-formats
+.Nd archive formats supported by the libarchive library
+.Sh DESCRIPTION
+The
+.Xr libarchive 3
+library reads and writes a variety of streaming archive formats.
+Generally speaking, all of these archive formats consist of a series of
+.Dq entries .
+Each entry stores a single file system object, such as a file, directory,
+or symbolic link.
+.Pp
+The following provides a brief description of each format supported
+by libarchive, with some information about recognized extensions or
+limitations of the current library support.
+Note that just because a format is supported by libarchive does not
+imply that a program that uses libarchive will support that format.
+Applications that use libarchive specify which formats they wish
+to support, though many programs do use libarchive convenience
+functions to enable all supported formats.
+.Ss Tar Formats
+The
+.Xr libarchive 3
+library can read most tar archives.
+It can write POSIX-standard
+.Dq ustar
+and
+.Dq pax interchange
+formats as well as v7 tar format and a subset of the legacy GNU tar format.
+.Pp
+All tar formats store each entry in one or more 512-byte records.
+The first record is used for file metadata, including filename,
+timestamp, and mode information, and the file data is stored in
+subsequent records.
+Later variants have extended this by either appropriating undefined
+areas of the header record, extending the header to multiple records,
+or by storing special entries that modify the interpretation of
+subsequent entries.
+.Bl -tag -width indent
+.It Cm gnutar
+The
+.Xr libarchive 3
+library can read most GNU-format tar archives.
+It currently supports the most popular GNU extensions, including
+modern long filename and linkname support, as well as atime and ctime data.
+The libarchive library does not support multi-volume
+archives, nor the old GNU long filename format.
+It can read GNU sparse file entries, including the new POSIX-based
+formats.
+.Pp
+The
+.Xr libarchive 3
+library can write GNU tar format, including long filename
+and linkname support, as well as atime and ctime data.
+.It Cm pax
+The
+.Xr libarchive 3
+library can read and write POSIX-compliant pax interchange format
+archives.
+Pax interchange format archives are an extension of the older ustar
+format that adds a separate entry with additional attributes stored
+as key/value pairs immediately before each regular entry.
+The presence of these additional entries is the only difference between
+pax interchange format and the older ustar format.
+The extended attributes are of unlimited length and are stored
+as UTF-8 Unicode strings.
+Keywords defined in the standard are in all lowercase; vendors are allowed
+to define custom keys by preceding them with the vendor name in all uppercase.
+When writing pax archives, libarchive uses many of the SCHILY keys
+defined by Joerg Schilling's
+.Dq star
+archiver and a few LIBARCHIVE keys.
+The libarchive library can read most of the SCHILY keys
+and most of the GNU keys introduced by GNU tar.
+It silently ignores any keywords that it does not understand.
+.Pp
+The pax interchange format converts filenames to Unicode
+and stores them using the UTF-8 encoding.
+Prior to libarchive 3.0, libarchive erroneously assumed
+that the system wide-character routines natively supported
+Unicode.
+This caused it to mis-handle non-ASCII filenames on systems
+that did not satisfy this assumption.
+.It Cm restricted pax
+The libarchive library can also write pax archives in which it
+attempts to suppress the extended attributes entry whenever
+possible.
+The result will be identical to a ustar archive unless the
+extended attributes entry is required to store a long file
+name, long linkname, extended ACL, file flags, or if any of the standard
+ustar data (user name, group name, UID, GID, etc) cannot be fully
+represented in the ustar header.
+In all cases, the result can be dearchived by any program that
+can read POSIX-compliant pax interchange format archives.
+Programs that correctly read ustar format (see below) will also be
+able to read this format; any extended attributes will be extracted as
+separate files stored in
+.Pa PaxHeader
+directories.
+.It Cm ustar
+The libarchive library can both read and write this format.
+This format has the following limitations:
+.Bl -bullet -compact
+.It
+Device major and minor numbers are limited to 21 bits.
+Nodes with larger numbers will not be added to the archive.
+.It
+Path names in the archive are limited to 255 bytes.
+(Shorter if there is no / character in exactly the right place.)
+.It
+Symbolic links and hard links are stored in the archive with
+the name of the referenced file.
+This name is limited to 100 bytes.
+.It
+Extended attributes, file flags, and other extended
+security information cannot be stored.
+.It
+Archive entries are limited to 8 gigabytes in size.
+.El
+Note that the pax interchange format has none of these restrictions.
+The ustar format is old and widely supported.
+It is recommended when compatibility is the primary concern.
+.It Cm v7
+The libarchive library can read and write the legacy v7 tar format.
+This format has the following limitations:
+.Bl -bullet -compact
+.It
+Only regular files, directories, and symbolic links can be archived.
+Block and character device nodes, FIFOs, and sockets cannot be archived.
+.It
+Path names in the archive are limited to 100 bytes.
+.It
+Symbolic links and hard links are stored in the archive with
+the name of the referenced file.
+This name is limited to 100 bytes.
+.It
+User and group information are stored as numeric IDs; there
+is no provision for storing user or group names.
+.It
+Extended attributes, file flags, and other extended
+security information cannot be stored.
+.It
+Archive entries are limited to 8 gigabytes in size.
+.El
+Generally, users should prefer the ustar format for portability
+as the v7 tar format is both less useful and less portable.
+.El
+.Pp
+The libarchive library also reads a variety of commonly-used extensions to
+the basic tar format.
+These extensions are recognized automatically whenever they appear.
+.Bl -tag -width indent
+.It Numeric extensions.
+The POSIX standards require fixed-length numeric fields to be written with
+some character position reserved for terminators.
+Libarchive allows these fields to be written without terminator characters.
+This extends the allowable range; in particular, ustar archives with this
+extension can support entries up to 64 gigabytes in size.
+Libarchive also recognizes base-256 values in most numeric fields.
+This essentially removes all limitations on file size, modification time,
+and device numbers.
+.It Solaris extensions
+Libarchive recognizes ACL and extended attribute records written
+by Solaris tar.
+.El
+.Pp
+The first tar program appeared in Seventh Edition Unix in 1979.
+The first official standard for the tar file format was the
+.Dq ustar
+(Unix Standard Tar) format defined by POSIX in 1988.
+POSIX.1-2001 extended the ustar format to create the
+.Dq pax interchange
+format.
+.Ss Cpio Formats
+The libarchive library can read and write a number of common cpio
+variants. A cpio archive stores each entry as a fixed-size header
+followed by a variable-length filename and variable-length data.
+Unlike the tar format, the cpio format does only minimal padding of
+the header or file data. There are several cpio variants, which
+differ primarily in how they store the initial header: some store the
+values as octal or hexadecimal numbers in ASCII, others as binary
+values of varying byte order and length.
+.Bl -tag -width indent
+.It Cm binary
+The libarchive library transparently reads both big-endian and
+little-endian variants of the the two binary cpio formats; the
+original one from PWB/UNIX, and the later, more widely used, variant.
+This format used 32-bit binary values for file size and mtime, and
+16-bit binary values for the other fields. The formats support only
+the file types present in UNIX at the time of their creation. File
+sizes are limited to 24 bits in the PWB format, because of the limits
+of the file system, and to 31 bits in the newer binary format, where
+signed 32 bit longs were used.
+.It Cm odc
+This is the POSIX standardized format, which is officially known as the
+.Dq cpio interchange format
+or the
+.Dq octet-oriented cpio archive format
+and sometimes unofficially referred to as the
+.Dq old character format .
+This format stores the header contents as octal values in ASCII.
+It is standard, portable, and immune from byte-order confusion.
+File sizes and mtime are limited to 33 bits (8GB file size),
+other fields are limited to 18 bits.
+.It Cm SVR4/newc
+The libarchive library can read both CRC and non-CRC variants of
+this format.
+The SVR4 format uses eight-digit hexadecimal values for
+all header fields.
+This limits file size to 4GB, and also limits the mtime and
+other fields to 32 bits.
+The SVR4 format can optionally include a CRC of the file
+contents, although libarchive does not currently verify this CRC.
+.El
+.Pp
+Cpio first appeared in PWB/UNIX 1.0, which was released within
+AT&T in 1977.
+PWB/UNIX 1.0 formed the basis of System III Unix, released outside
+of AT&T in 1981.
+This makes cpio older than tar, although cpio was not included
+in Version 7 AT&T Unix.
+As a result, the tar command became much better known in universities
+and research groups that used Version 7.
+The combination of the
+.Nm find
+and
+.Nm cpio
+utilities provided very precise control over file selection.
+Unfortunately, the format has many limitations that make it unsuitable
+for widespread use.
+Only the POSIX format permits files over 4GB, and its 18-bit
+limit for most other fields makes it unsuitable for modern systems.
+In addition, cpio formats only store numeric UID/GID values (not
+usernames and group names), which can make it very difficult to correctly
+transfer archives across systems with dissimilar user numbering.
+.Ss Shar Formats
+A
+.Dq shell archive
+is a shell script that, when executed on a POSIX-compliant
+system, will recreate a collection of file system objects.
+The libarchive library can write two different kinds of shar archives:
+.Bl -tag -width indent
+.It Cm shar
+The traditional shar format uses a limited set of POSIX
+commands, including
+.Xr echo 1 ,
+.Xr mkdir 1 ,
+and
+.Xr sed 1 .
+It is suitable for portably archiving small collections of plain text files.
+However, it is not generally well-suited for large archives
+(many implementations of
+.Xr sh 1
+have limits on the size of a script) nor should it be used with non-text files.
+.It Cm shardump
+This format is similar to shar but encodes files using
+.Xr uuencode 1
+so that the result will be a plain text file regardless of the file contents.
+It also includes additional shell commands that attempt to reproduce as
+many file attributes as possible, including owner, mode, and flags.
+The additional commands used to restore file attributes make
+shardump archives less portable than plain shar archives.
+.El
+.Ss ISO9660 format
+Libarchive can read and extract from files containing ISO9660-compliant
+CDROM images.
+In many cases, this can remove the need to burn a physical CDROM
+just in order to read the files contained in an ISO9660 image.
+It also avoids security and complexity issues that come with
+virtual mounts and loopback devices.
+Libarchive supports the most common Rockridge extensions and has partial
+support for Joliet extensions.
+If both extensions are present, the Joliet extensions will be
+used and the Rockridge extensions will be ignored.
+In particular, this can create problems with hardlinks and symlinks,
+which are supported by Rockridge but not by Joliet.
+.Pp
+Libarchive reads ISO9660 images using a streaming strategy.
+This allows it to read compressed images directly
+(decompressing on the fly) and allows it to read images
+directly from network sockets, pipes, and other non-seekable
+data sources.
+This strategy works well for optimized ISO9660 images created
+by many popular programs.
+Such programs collect all directory information at the beginning
+of the ISO9660 image so it can be read from a physical disk
+with a minimum of seeking.
+However, not all ISO9660 images can be read in this fashion.
+.Pp
+Libarchive can also write ISO9660 images.
+Such images are fully optimized with the directory information
+preceding all file data.
+This is done by storing all file data to a temporary file
+while collecting directory information in memory.
+When the image is finished, libarchive writes out the
+directory structure followed by the file data.
+The location used for the temporary file can be changed
+by the usual environment variables.
+.Ss Zip format
+Libarchive can read and write zip format archives that have
+uncompressed entries and entries compressed with the
+.Dq deflate
+algorithm.
+Other zip compression algorithms are not supported.
+It can extract jar archives, archives that use Zip64 extensions and
+self-extracting zip archives.
+Libarchive can use either of two different strategies for
+reading Zip archives:
+a streaming strategy which is fast and can handle extremely
+large archives, and a seeking strategy which can correctly
+process self-extracting Zip archives and archives with
+deleted members or other in-place modifications.
+.Pp
+The streaming reader processes Zip archives as they are read.
+It can read archives of arbitrary size from tape or
+network sockets, and can decode Zip archives that have
+been separately compressed or encoded.
+However, self-extracting Zip archives and archives with
+certain types of modifications cannot be correctly
+handled.
+Such archives require that the reader first process the
+Central Directory, which is ordinarily located
+at the end of a Zip archive and is thus inaccessible
+to the streaming reader.
+If the program using libarchive has enabled seek support, then
+libarchive will use this to processes the central directory first.
+.Pp
+In particular, the seeking reader must be used to
+correctly handle self-extracting archives.
+Such archives consist of a program followed by a regular
+Zip archive.
+The streaming reader cannot parse the initial program
+portion, but the seeking reader starts by reading the
+Central Directory from the end of the archive.
+Similarly, Zip archives that have been modified in-place
+can have deleted entries or other garbage data that
+can only be accurately detected by first reading the
+Central Directory.
+.Ss Archive (library) file format
+The Unix archive format (commonly created by the
+.Xr ar 1
+archiver) is a general-purpose format which is
+used almost exclusively for object files to be
+read by the link editor
+.Xr ld 1 .
+The ar format has never been standardised.
+There are two common variants:
+the GNU format derived from SVR4,
+and the BSD format, which first appeared in 4.4BSD.
+The two differ primarily in their handling of filenames
+longer than 15 characters:
+the GNU/SVR4 variant writes a filename table at the beginning of the archive;
+the BSD format stores each long filename in an extension
+area adjacent to the entry.
+Libarchive can read both extensions,
+including archives that may include both types of long filenames.
+Programs using libarchive can write GNU/SVR4 format
+if they provide an entry called
+.Pa //
+containing a filename table to be written into the archive
+before any of the entries.
+Any entries whose names are not in the filename table
+will be written using BSD-style long filenames.
+This can cause problems for programs such as
+GNU ld that do not support the BSD-style long filenames.
+.Ss mtree
+Libarchive can read and write files in
+.Xr mtree 5
+format.
+This format is not a true archive format, but rather a textual description
+of a file hierarchy in which each line specifies the name of a file and
+provides specific metadata about that file.
+Libarchive can read all of the keywords supported by both
+the NetBSD and FreeBSD versions of
+.Xr mtree 8 ,
+although many of the keywords cannot currently be stored in an
+.Tn archive_entry
+object.
+When writing, libarchive supports use of the
+.Xr archive_write_set_options 3
+interface to specify which keywords should be included in the
+output.
+If libarchive was compiled with access to suitable
+cryptographic libraries (such as the OpenSSL libraries),
+it can compute hash entries such as
+.Cm sha512
+or
+.Cm md5
+from file data being written to the mtree writer.
+.Pp
+When reading an mtree file, libarchive will locate the corresponding
+files on disk using the
+.Cm contents
+keyword if present or the regular filename.
+If it can locate and open the file on disk, it will use that
+to fill in any metadata that is missing from the mtree file
+and will read the file contents and return those to the program
+using libarchive.
+If it cannot locate and open the file on disk, libarchive
+will return an error for any attempt to read the entry
+body.
+.Ss 7-Zip
+Libarchive can read and write 7-Zip format archives.
+TODO: Need more information
+.Ss CAB
+Libarchive can read Microsoft Cabinet (
+.Dq CAB )
+format archives.
+TODO: Need more information.
+.Ss LHA
+TODO: Information about libarchive's LHA support
+.Ss RAR
+Libarchive has limited support for reading RAR format archives.
+Currently, libarchive can read RARv3 format archives
+which have been either created uncompressed, or compressed using
+any of the compression methods supported by the RARv3 format.
+Libarchive can also read self-extracting RAR archives.
+.Ss Warc
+Libarchive can read and write
+.Dq web archives .
+TODO: Need more information
+.Ss XAR
+Libarchive can read and write the XAR format used by many Apple tools.
+TODO: Need more information
+.Sh SEE ALSO
+.Xr ar 1 ,
+.Xr cpio 1 ,
+.Xr mkisofs 1 ,
+.Xr shar 1 ,
+.Xr tar 1 ,
+.Xr zip 1 ,
+.Xr zlib 3 ,
+.Xr cpio 5 ,
+.Xr mtree 5 ,
+.Xr tar 5