summaryrefslogtreecommitdiffstats
path: root/Utilities/cmlibarchive/libarchive/tar.5
diff options
context:
space:
mode:
Diffstat (limited to 'Utilities/cmlibarchive/libarchive/tar.5')
-rw-r--r--Utilities/cmlibarchive/libarchive/tar.5947
1 files changed, 947 insertions, 0 deletions
diff --git a/Utilities/cmlibarchive/libarchive/tar.5 b/Utilities/cmlibarchive/libarchive/tar.5
new file mode 100644
index 0000000..688bb92
--- /dev/null
+++ b/Utilities/cmlibarchive/libarchive/tar.5
@@ -0,0 +1,947 @@
+.\" Copyright (c) 2003-2009 Tim Kientzle
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd December 23, 2011
+.Dt TAR 5
+.Os
+.Sh NAME
+.Nm tar
+.Nd format of tape archive files
+.Sh DESCRIPTION
+The
+.Nm
+archive format collects any number of files, directories, and other
+file system objects (symbolic links, device nodes, etc.) into a single
+stream of bytes.
+The format was originally designed to be used with
+tape drives that operate with fixed-size blocks, but is widely used as
+a general packaging mechanism.
+.Ss General Format
+A
+.Nm
+archive consists of a series of 512-byte records.
+Each file system object requires a header record which stores basic metadata
+(pathname, owner, permissions, etc.) and zero or more records containing any
+file data.
+The end of the archive is indicated by two records consisting
+entirely of zero bytes.
+.Pp
+For compatibility with tape drives that use fixed block sizes,
+programs that read or write tar files always read or write a fixed
+number of records with each I/O operation.
+These
+.Dq blocks
+are always a multiple of the record size.
+The maximum block size supported by early
+implementations was 10240 bytes or 20 records.
+This is still the default for most implementations
+although block sizes of 1MiB (2048 records) or larger are
+commonly used with modern high-speed tape drives.
+(Note: the terms
+.Dq block
+and
+.Dq record
+here are not entirely standard; this document follows the
+convention established by John Gilmore in documenting
+.Nm pdtar . )
+.Ss Old-Style Archive Format
+The original tar archive format has been extended many times to
+include additional information that various implementors found
+necessary.
+This section describes the variant implemented by the tar command
+included in
+.At v7 ,
+which seems to be the earliest widely-used version of the tar program.
+.Pp
+The header record for an old-style
+.Nm
+archive consists of the following:
+.Bd -literal -offset indent
+struct header_old_tar {
+ char name[100];
+ char mode[8];
+ char uid[8];
+ char gid[8];
+ char size[12];
+ char mtime[12];
+ char checksum[8];
+ char linkflag[1];
+ char linkname[100];
+ char pad[255];
+};
+.Ed
+All unused bytes in the header record are filled with nulls.
+.Bl -tag -width indent
+.It Va name
+Pathname, stored as a null-terminated string.
+Early tar implementations only stored regular files (including
+hardlinks to those files).
+One common early convention used a trailing "/" character to indicate
+a directory name, allowing directory permissions and owner information
+to be archived and restored.
+.It Va mode
+File mode, stored as an octal number in ASCII.
+.It Va uid , Va gid
+User id and group id of owner, as octal numbers in ASCII.
+.It Va size
+Size of file, as octal number in ASCII.
+For regular files only, this indicates the amount of data
+that follows the header.
+In particular, this field was ignored by early tar implementations
+when extracting hardlinks.
+Modern writers should always store a zero length for hardlink entries.
+.It Va mtime
+Modification time of file, as an octal number in ASCII.
+This indicates the number of seconds since the start of the epoch,
+00:00:00 UTC January 1, 1970.
+Note that negative values should be avoided
+here, as they are handled inconsistently.
+.It Va checksum
+Header checksum, stored as an octal number in ASCII.
+To compute the checksum, set the checksum field to all spaces,
+then sum all bytes in the header using unsigned arithmetic.
+This field should be stored as six octal digits followed by a null and a space
+character.
+Note that many early implementations of tar used signed arithmetic
+for the checksum field, which can cause interoperability problems
+when transferring archives between systems.
+Modern robust readers compute the checksum both ways and accept the
+header if either computation matches.
+.It Va linkflag , Va linkname
+In order to preserve hardlinks and conserve tape, a file
+with multiple links is only written to the archive the first
+time it is encountered.
+The next time it is encountered, the
+.Va linkflag
+is set to an ASCII
+.Sq 1
+and the
+.Va linkname
+field holds the first name under which this file appears.
+(Note that regular files have a null value in the
+.Va linkflag
+field.)
+.El
+.Pp
+Early tar implementations varied in how they terminated these fields.
+The tar command in
+.At v7
+used the following conventions (this is also documented in early BSD manpages):
+the pathname must be null-terminated;
+the mode, uid, and gid fields must end in a space and a null byte;
+the size and mtime fields must end in a space;
+the checksum is terminated by a null and a space.
+Early implementations filled the numeric fields with leading spaces.
+This seems to have been common practice until the
+.St -p1003.1-88
+standard was released.
+For best portability, modern implementations should fill the numeric
+fields with leading zeros.
+.Ss Pre-POSIX Archives
+An early draft of
+.St -p1003.1-88
+served as the basis for John Gilmore's
+.Nm pdtar
+program and many system implementations from the late 1980s
+and early 1990s.
+These archives generally follow the POSIX ustar
+format described below with the following variations:
+.Bl -bullet -compact -width indent
+.It
+The magic value consists of the five characters
+.Dq ustar
+followed by a space.
+The version field contains a space character followed by a null.
+.It
+The numeric fields are generally filled with leading spaces
+(not leading zeros as recommended in the final standard).
+.It
+The prefix field is often not used, limiting pathnames to
+the 100 characters of old-style archives.
+.El
+.Ss POSIX ustar Archives
+.St -p1003.1-88
+defined a standard tar file format to be read and written
+by compliant implementations of
+.Xr tar 1 .
+This format is often called the
+.Dq ustar
+format, after the magic value used
+in the header.
+(The name is an acronym for
+.Dq Unix Standard TAR . )
+It extends the historic format with new fields:
+.Bd -literal -offset indent
+struct header_posix_ustar {
+ char name[100];
+ char mode[8];
+ char uid[8];
+ char gid[8];
+ char size[12];
+ char mtime[12];
+ char checksum[8];
+ char typeflag[1];
+ char linkname[100];
+ char magic[6];
+ char version[2];
+ char uname[32];
+ char gname[32];
+ char devmajor[8];
+ char devminor[8];
+ char prefix[155];
+ char pad[12];
+};
+.Ed
+.Bl -tag -width indent
+.It Va typeflag
+Type of entry.
+POSIX extended the earlier
+.Va linkflag
+field with several new type values:
+.Bl -tag -width indent -compact
+.It Dq 0
+Regular file.
+NUL should be treated as a synonym, for compatibility purposes.
+.It Dq 1
+Hard link.
+.It Dq 2
+Symbolic link.
+.It Dq 3
+Character device node.
+.It Dq 4
+Block device node.
+.It Dq 5
+Directory.
+.It Dq 6
+FIFO node.
+.It Dq 7
+Reserved.
+.It Other
+A POSIX-compliant implementation must treat any unrecognized typeflag value
+as a regular file.
+In particular, writers should ensure that all entries
+have a valid filename so that they can be restored by readers that do not
+support the corresponding extension.
+Uppercase letters "A" through "Z" are reserved for custom extensions.
+Note that sockets and whiteout entries are not archivable.
+.El
+It is worth noting that the
+.Va size
+field, in particular, has different meanings depending on the type.
+For regular files, of course, it indicates the amount of data
+following the header.
+For directories, it may be used to indicate the total size of all
+files in the directory, for use by operating systems that pre-allocate
+directory space.
+For all other types, it should be set to zero by writers and ignored
+by readers.
+.It Va magic
+Contains the magic value
+.Dq ustar
+followed by a NUL byte to indicate that this is a POSIX standard archive.
+Full compliance requires the uname and gname fields be properly set.
+.It Va version
+Version.
+This should be
+.Dq 00
+(two copies of the ASCII digit zero) for POSIX standard archives.
+.It Va uname , Va gname
+User and group names, as null-terminated ASCII strings.
+These should be used in preference to the uid/gid values
+when they are set and the corresponding names exist on
+the system.
+.It Va devmajor , Va devminor
+Major and minor numbers for character device or block device entry.
+.It Va name , Va prefix
+If the pathname is too long to fit in the 100 bytes provided by the standard
+format, it can be split at any
+.Pa /
+character with the first portion going into the prefix field.
+If the prefix field is not empty, the reader will prepend
+the prefix value and a
+.Pa /
+character to the regular name field to obtain the full pathname.
+The standard does not require a trailing
+.Pa /
+character on directory names, though most implementations still
+include this for compatibility reasons.
+.El
+.Pp
+Note that all unused bytes must be set to
+.Dv NUL .
+.Pp
+Field termination is specified slightly differently by POSIX
+than by previous implementations.
+The
+.Va magic ,
+.Va uname ,
+and
+.Va gname
+fields must have a trailing
+.Dv NUL .
+The
+.Va pathname ,
+.Va linkname ,
+and
+.Va prefix
+fields must have a trailing
+.Dv NUL
+unless they fill the entire field.
+(In particular, it is possible to store a 256-character pathname if it
+happens to have a
+.Pa /
+as the 156th character.)
+POSIX requires numeric fields to be zero-padded in the front, and requires
+them to be terminated with either space or
+.Dv NUL
+characters.
+.Pp
+Currently, most tar implementations comply with the ustar
+format, occasionally extending it by adding new fields to the
+blank area at the end of the header record.
+.Ss Numeric Extensions
+There have been several attempts to extend the range of sizes
+or times supported by modifying how numbers are stored in the
+header.
+.Pp
+One obvious extension to increase the size of files is to
+eliminate the terminating characters from the various
+numeric fields.
+For example, the standard only allows the size field to contain
+11 octal digits, reserving the twelfth byte for a trailing
+NUL character.
+Allowing 12 octal digits allows file sizes up to 64 GB.
+.Pp
+Another extension, utilized by GNU tar, star, and other newer
+.Nm
+implementations, permits binary numbers in the standard numeric fields.
+This is flagged by setting the high bit of the first byte.
+The remainder of the field is treated as a signed twos-complement
+value.
+This permits 95-bit values for the length and time fields
+and 63-bit values for the uid, gid, and device numbers.
+In particular, this provides a consistent way to handle
+negative time values.
+GNU tar supports this extension for the
+length, mtime, ctime, and atime fields.
+Joerg Schilling's star program and the libarchive library support
+this extension for all numeric fields.
+Note that this extension is largely obsoleted by the extended
+attribute record provided by the pax interchange format.
+.Pp
+Another early GNU extension allowed base-64 values rather than octal.
+This extension was short-lived and is no longer supported by any
+implementation.
+.Ss Pax Interchange Format
+There are many attributes that cannot be portably stored in a
+POSIX ustar archive.
+.St -p1003.1-2001
+defined a
+.Dq pax interchange format
+that uses two new types of entries to hold text-formatted
+metadata that applies to following entries.
+Note that a pax interchange format archive is a ustar archive in every
+respect.
+The new data is stored in ustar-compatible archive entries that use the
+.Dq x
+or
+.Dq g
+typeflag.
+In particular, older implementations that do not fully support these
+extensions will extract the metadata into regular files, where the
+metadata can be examined as necessary.
+.Pp
+An entry in a pax interchange format archive consists of one or
+two standard ustar entries, each with its own header and data.
+The first optional entry stores the extended attributes
+for the following entry.
+This optional first entry has an "x" typeflag and a size field that
+indicates the total size of the extended attributes.
+The extended attributes themselves are stored as a series of text-format
+lines encoded in the portable UTF-8 encoding.
+Each line consists of a decimal number, a space, a key string, an equals
+sign, a value string, and a new line.
+The decimal number indicates the length of the entire line, including the
+initial length field and the trailing newline.
+An example of such a field is:
+.Dl 25 ctime=1084839148.1212\en
+Keys in all lowercase are standard keys.
+Vendors can add their own keys by prefixing them with an all uppercase
+vendor name and a period.
+Note that, unlike the historic header, numeric values are stored using
+decimal, not octal.
+A description of some common keys follows:
+.Bl -tag -width indent
+.It Cm atime , Cm ctime , Cm mtime
+File access, inode change, and modification times.
+These fields can be negative or include a decimal point and a fractional value.
+.It Cm hdrcharset
+The character set used by the pax extension values.
+By default, all textual values in the pax extended attributes
+are assumed to be in UTF-8, including pathnames, user names,
+and group names.
+In some cases, it is not possible to translate local
+conventions into UTF-8.
+If this key is present and the value is the six-character ASCII string
+.Dq BINARY ,
+then all textual values are assumed to be in a platform-dependent
+multi-byte encoding.
+Note that there are only two valid values for this key:
+.Dq BINARY
+or
+.Dq ISO-IR\ 10646\ 2000\ UTF-8 .
+No other values are permitted by the standard, and
+the latter value should generally not be used as it is the
+default when this key is not specified.
+In particular, this flag should not be used as a general
+mechanism to allow filenames to be stored in arbitrary
+encodings.
+.It Cm uname , Cm uid , Cm gname , Cm gid
+User name, group name, and numeric UID and GID values.
+The user name and group name stored here are encoded in UTF8
+and can thus include non-ASCII characters.
+The UID and GID fields can be of arbitrary length.
+.It Cm linkpath
+The full path of the linked-to file.
+Note that this is encoded in UTF8 and can thus include non-ASCII characters.
+.It Cm path
+The full pathname of the entry.
+Note that this is encoded in UTF8 and can thus include non-ASCII characters.
+.It Cm realtime.* , Cm security.*
+These keys are reserved and may be used for future standardization.
+.It Cm size
+The size of the file.
+Note that there is no length limit on this field, allowing conforming
+archives to store files much larger than the historic 8GB limit.
+.It Cm SCHILY.*
+Vendor-specific attributes used by Joerg Schilling's
+.Nm star
+implementation.
+.It Cm SCHILY.acl.access , Cm SCHILY.acl.default
+Stores the access and default ACLs as textual strings in a format
+that is an extension of the format specified by POSIX.1e draft 17.
+In particular, each user or group access specification can include a fourth
+colon-separated field with the numeric UID or GID.
+This allows ACLs to be restored on systems that may not have complete
+user or group information available (such as when NIS/YP or LDAP services
+are temporarily unavailable).
+.It Cm SCHILY.devminor , Cm SCHILY.devmajor
+The full minor and major numbers for device nodes.
+.It Cm SCHILY.fflags
+The file flags.
+.It Cm SCHILY.realsize
+The full size of the file on disk.
+XXX explain? XXX
+.It Cm SCHILY.dev, Cm SCHILY.ino , Cm SCHILY.nlinks
+The device number, inode number, and link count for the entry.
+In particular, note that a pax interchange format archive using Joerg
+Schilling's
+.Cm SCHILY.*
+extensions can store all of the data from
+.Va struct stat .
+.It Cm LIBARCHIVE.*
+Vendor-specific attributes used by the
+.Nm libarchive
+library and programs that use it.
+.It Cm LIBARCHIVE.creationtime
+The time when the file was created.
+(This should not be confused with the POSIX
+.Dq ctime
+attribute, which refers to the time when the file
+metadata was last changed.)
+.It Cm LIBARCHIVE.xattr. Ns Ar namespace Ns . Ns Ar key
+Libarchive stores POSIX.1e-style extended attributes using
+keys of this form.
+The
+.Ar key
+value is URL-encoded:
+All non-ASCII characters and the two special characters
+.Dq =
+and
+.Dq %
+are encoded as
+.Dq %
+followed by two uppercase hexadecimal digits.
+The value of this key is the extended attribute value
+encoded in base 64.
+XXX Detail the base-64 format here XXX
+.It Cm VENDOR.*
+XXX document other vendor-specific extensions XXX
+.El
+.Pp
+Any values stored in an extended attribute override the corresponding
+values in the regular tar header.
+Note that compliant readers should ignore the regular fields when they
+are overridden.
+This is important, as existing archivers are known to store non-compliant
+values in the standard header fields in this situation.
+There are no limits on length for any of these fields.
+In particular, numeric fields can be arbitrarily large.
+All text fields are encoded in UTF8.
+Compliant writers should store only portable 7-bit ASCII characters in
+the standard ustar header and use extended
+attributes whenever a text value contains non-ASCII characters.
+.Pp
+In addition to the
+.Cm x
+entry described above, the pax interchange format
+also supports a
+.Cm g
+entry.
+The
+.Cm g
+entry is identical in format, but specifies attributes that serve as
+defaults for all subsequent archive entries.
+The
+.Cm g
+entry is not widely used.
+.Pp
+Besides the new
+.Cm x
+and
+.Cm g
+entries, the pax interchange format has a few other minor variations
+from the earlier ustar format.
+The most troubling one is that hardlinks are permitted to have
+data following them.
+This allows readers to restore any hardlink to a file without
+having to rewind the archive to find an earlier entry.
+However, it creates complications for robust readers, as it is no longer
+clear whether or not they should ignore the size field for hardlink entries.
+.Ss GNU Tar Archives
+The GNU tar program started with a pre-POSIX format similar to that
+described earlier and has extended it using several different mechanisms:
+It added new fields to the empty space in the header (some of which was later
+used by POSIX for conflicting purposes);
+it allowed the header to be continued over multiple records;
+and it defined new entries that modify following entries
+(similar in principle to the
+.Cm x
+entry described above, but each GNU special entry is single-purpose,
+unlike the general-purpose
+.Cm x
+entry).
+As a result, GNU tar archives are not POSIX compatible, although
+more lenient POSIX-compliant readers can successfully extract most
+GNU tar archives.
+.Bd -literal -offset indent
+struct header_gnu_tar {
+ char name[100];
+ char mode[8];
+ char uid[8];
+ char gid[8];
+ char size[12];
+ char mtime[12];
+ char checksum[8];
+ char typeflag[1];
+ char linkname[100];
+ char magic[6];
+ char version[2];
+ char uname[32];
+ char gname[32];
+ char devmajor[8];
+ char devminor[8];
+ char atime[12];
+ char ctime[12];
+ char offset[12];
+ char longnames[4];
+ char unused[1];
+ struct {
+ char offset[12];
+ char numbytes[12];
+ } sparse[4];
+ char isextended[1];
+ char realsize[12];
+ char pad[17];
+};
+.Ed
+.Bl -tag -width indent
+.It Va typeflag
+GNU tar uses the following special entry types, in addition to
+those defined by POSIX:
+.Bl -tag -width indent
+.It "7"
+GNU tar treats type "7" records identically to type "0" records,
+except on one obscure RTOS where they are used to indicate the
+pre-allocation of a contiguous file on disk.
+.It "D"
+This indicates a directory entry.
+Unlike the POSIX-standard "5"
+typeflag, the header is followed by data records listing the names
+of files in this directory.
+Each name is preceded by an ASCII "Y"
+if the file is stored in this archive or "N" if the file is not
+stored in this archive.
+Each name is terminated with a null, and
+an extra null marks the end of the name list.
+The purpose of this
+entry is to support incremental backups; a program restoring from
+such an archive may wish to delete files on disk that did not exist
+in the directory when the archive was made.
+.Pp
+Note that the "D" typeflag specifically violates POSIX, which requires
+that unrecognized typeflags be restored as normal files.
+In this case, restoring the "D" entry as a file could interfere
+with subsequent creation of the like-named directory.
+.It "K"
+The data for this entry is a long linkname for the following regular entry.
+.It "L"
+The data for this entry is a long pathname for the following regular entry.
+.It "M"
+This is a continuation of the last file on the previous volume.
+GNU multi-volume archives guarantee that each volume begins with a valid
+entry header.
+To ensure this, a file may be split, with part stored at the end of one volume,
+and part stored at the beginning of the next volume.
+The "M" typeflag indicates that this entry continues an existing file.
+Such entries can only occur as the first or second entry
+in an archive (the latter only if the first entry is a volume label).
+The
+.Va size
+field specifies the size of this entry.
+The
+.Va offset
+field at bytes 369-380 specifies the offset where this file fragment
+begins.
+The
+.Va realsize
+field specifies the total size of the file (which must equal
+.Va size
+plus
+.Va offset ) .
+When extracting, GNU tar checks that the header file name is the one it is
+expecting, that the header offset is in the correct sequence, and that
+the sum of offset and size is equal to realsize.
+.It "N"
+Type "N" records are no longer generated by GNU tar.
+They contained a
+list of files to be renamed or symlinked after extraction; this was
+originally used to support long names.
+The contents of this record
+are a text description of the operations to be done, in the form
+.Dq Rename %s to %s\en
+or
+.Dq Symlink %s to %s\en ;
+in either case, both
+filenames are escaped using K&R C syntax.
+Due to security concerns, "N" records are now generally ignored
+when reading archives.
+.It "S"
+This is a
+.Dq sparse
+regular file.
+Sparse files are stored as a series of fragments.
+The header contains a list of fragment offset/length pairs.
+If more than four such entries are required, the header is
+extended as necessary with
+.Dq extra
+header extensions (an older format that is no longer used), or
+.Dq sparse
+extensions.
+.It "V"
+The
+.Va name
+field should be interpreted as a tape/volume header name.
+This entry should generally be ignored on extraction.
+.El
+.It Va magic
+The magic field holds the five characters
+.Dq ustar
+followed by a space.
+Note that POSIX ustar archives have a trailing null.
+.It Va version
+The version field holds a space character followed by a null.
+Note that POSIX ustar archives use two copies of the ASCII digit
+.Dq 0 .
+.It Va atime , Va ctime
+The time the file was last accessed and the time of
+last change of file information, stored in octal as with
+.Va mtime .
+.It Va longnames
+This field is apparently no longer used.
+.It Sparse Va offset / Va numbytes
+Each such structure specifies a single fragment of a sparse
+file.
+The two fields store values as octal numbers.
+The fragments are each padded to a multiple of 512 bytes
+in the archive.
+On extraction, the list of fragments is collected from the
+header (including any extension headers), and the data
+is then read and written to the file at appropriate offsets.
+.It Va isextended
+If this is set to non-zero, the header will be followed by additional
+.Dq sparse header
+records.
+Each such record contains information about as many as 21 additional
+sparse blocks as shown here:
+.Bd -literal -offset indent
+struct gnu_sparse_header {
+ struct {
+ char offset[12];
+ char numbytes[12];
+ } sparse[21];
+ char isextended[1];
+ char padding[7];
+};
+.Ed
+.It Va realsize
+A binary representation of the file's complete size, with a much larger range
+than the POSIX file size.
+In particular, with
+.Cm M
+type files, the current entry is only a portion of the file.
+In that case, the POSIX size field will indicate the size of this
+entry; the
+.Va realsize
+field will indicate the total size of the file.
+.El
+.Ss GNU tar pax archives
+GNU tar 1.14 (XXX check this XXX) and later will write
+pax interchange format archives when you specify the
+.Fl -posix
+flag.
+This format follows the pax interchange format closely,
+using some
+.Cm SCHILY
+tags and introducing new keywords to store sparse file information.
+There have been three iterations of the sparse file support, referred to
+as
+.Dq 0.0 ,
+.Dq 0.1 ,
+and
+.Dq 1.0 .
+.Bl -tag -width indent
+.It Cm GNU.sparse.numblocks , Cm GNU.sparse.offset , Cm GNU.sparse.numbytes , Cm GNU.sparse.size
+The
+.Dq 0.0
+format used an initial
+.Cm GNU.sparse.numblocks
+attribute to indicate the number of blocks in the file, a pair of
+.Cm GNU.sparse.offset
+and
+.Cm GNU.sparse.numbytes
+to indicate the offset and size of each block,
+and a single
+.Cm GNU.sparse.size
+to indicate the full size of the file.
+This is not the same as the size in the tar header because the
+latter value does not include the size of any holes.
+This format required that the order of attributes be preserved and
+relied on readers accepting multiple appearances of the same attribute
+names, which is not officially permitted by the standards.
+.It Cm GNU.sparse.map
+The
+.Dq 0.1
+format used a single attribute that stored a comma-separated
+list of decimal numbers.
+Each pair of numbers indicated the offset and size, respectively,
+of a block of data.
+This does not work well if the archive is extracted by an archiver
+that does not recognize this extension, since many pax implementations
+simply discard unrecognized attributes.
+.It Cm GNU.sparse.major , Cm GNU.sparse.minor , Cm GNU.sparse.name , Cm GNU.sparse.realsize
+The
+.Dq 1.0
+format stores the sparse block map in one or more 512-byte blocks
+prepended to the file data in the entry body.
+The pax attributes indicate the existence of this map
+(via the
+.Cm GNU.sparse.major
+and
+.Cm GNU.sparse.minor
+fields)
+and the full size of the file.
+The
+.Cm GNU.sparse.name
+holds the true name of the file.
+To avoid confusion, the name stored in the regular tar header
+is a modified name so that extraction errors will be apparent
+to users.
+.El
+.Ss Solaris Tar
+XXX More Details Needed XXX
+.Pp
+Solaris tar (beginning with SunOS XXX 5.7 ?? XXX) supports an
+.Dq extended
+format that is fundamentally similar to pax interchange format,
+with the following differences:
+.Bl -bullet -compact -width indent
+.It
+Extended attributes are stored in an entry whose type is
+.Cm X ,
+not
+.Cm x ,
+as used by pax interchange format.
+The detailed format of this entry appears to be the same
+as detailed above for the
+.Cm x
+entry.
+.It
+An additional
+.Cm A
+header is used to store an ACL for the following regular entry.
+The body of this entry contains a seven-digit octal number
+followed by a zero byte, followed by the
+textual ACL description.
+The octal value is the number of ACL entries
+plus a constant that indicates the ACL type: 01000000
+for POSIX.1e ACLs and 03000000 for NFSv4 ACLs.
+.El
+.Ss AIX Tar
+XXX More details needed XXX
+.Pp
+AIX Tar uses a ustar-formatted header with the type
+.Cm A
+for storing coded ACL information.
+Unlike the Solaris format, AIX tar writes this header after the
+regular file body to which it applies.
+The pathname in this header is either
+.Cm NFS4
+or
+.Cm AIXC
+to indicate the type of ACL stored.
+The actual ACL is stored in platform-specific binary format.
+.Ss Mac OS X Tar
+The tar distributed with Apple's Mac OS X stores most regular files
+as two separate files in the tar archive.
+The two files have the same name except that the first
+one has
+.Dq ._
+prepended to the last path element.
+This special file stores an AppleDouble-encoded
+binary blob with additional metadata about the second file,
+including ACL, extended attributes, and resources.
+To recreate the original file on disk, each
+separate file can be extracted and the Mac OS X
+.Fn copyfile
+function can be used to unpack the separate
+metadata file and apply it to th regular file.
+Conversely, the same function provides a
+.Dq pack
+option to encode the extended metadata from
+a file into a separate file whose contents
+can then be put into a tar archive.
+.Pp
+Note that the Apple extended attributes interact
+badly with long filenames.
+Since each file is stored with the full name,
+a separate set of extensions needs to be included
+in the archive for each one, doubling the overhead
+required for files with long names.
+.Ss Summary of tar type codes
+The following list is a condensed summary of the type codes
+used in tar header records generated by different tar implementations.
+More details about specific implementations can be found above:
+.Bl -tag -compact -width XXX
+.It NUL
+Early tar programs stored a zero byte for regular files.
+.It Cm 0
+POSIX standard type code for a regular file.
+.It Cm 1
+POSIX standard type code for a hard link description.
+.It Cm 2
+POSIX standard type code for a symbolic link description.
+.It Cm 3
+POSIX standard type code for a character device node.
+.It Cm 4
+POSIX standard type code for a block device node.
+.It Cm 5
+POSIX standard type code for a directory.
+.It Cm 6
+POSIX standard type code for a FIFO.
+.It Cm 7
+POSIX reserved.
+.It Cm 7
+GNU tar used for pre-allocated files on some systems.
+.It Cm A
+Solaris tar ACL description stored prior to a regular file header.
+.It Cm A
+AIX tar ACL description stored after the file body.
+.It Cm D
+GNU tar directory dump.
+.It Cm K
+GNU tar long linkname for the following header.
+.It Cm L
+GNU tar long pathname for the following header.
+.It Cm M
+GNU tar multivolume marker, indicating the file is a continuation of a file from the previous volume.
+.It Cm N
+GNU tar long filename support. Deprecated.
+.It Cm S
+GNU tar sparse regular file.
+.It Cm V
+GNU tar tape/volume header name.
+.It Cm X
+Solaris tar general-purpose extension header.
+.It Cm g
+POSIX pax interchange format global extensions.
+.It Cm x
+POSIX pax interchange format per-file extensions.
+.El
+.Sh SEE ALSO
+.Xr ar 1 ,
+.Xr pax 1 ,
+.Xr tar 1
+.Sh STANDARDS
+The
+.Nm tar
+utility is no longer a part of POSIX or the Single Unix Standard.
+It last appeared in
+.St -susv2 .
+It has been supplanted in subsequent standards by
+.Xr pax 1 .
+The ustar format is currently part of the specification for the
+.Xr pax 1
+utility.
+The pax interchange file format is new with
+.St -p1003.1-2001 .
+.Sh HISTORY
+A
+.Nm tar
+command appeared in Seventh Edition Unix, which was released in January, 1979.
+It replaced the
+.Nm tp
+program from Fourth Edition Unix which in turn replaced the
+.Nm tap
+program from First Edition Unix.
+John Gilmore's
+.Nm pdtar
+public-domain implementation (circa 1987) was highly influential
+and formed the basis of
+.Nm GNU tar
+(circa 1988).
+Joerg Shilling's
+.Nm star
+archiver is another open-source (GPL) archiver (originally developed
+circa 1985) which features complete support for pax interchange
+format.
+.Pp
+This documentation was written as part of the
+.Nm libarchive
+and
+.Nm bsdtar
+project by
+.An Tim Kientzle Aq kientzle@FreeBSD.org .