summaryrefslogtreecommitdiffstats
path: root/Utilities/cmlibarchive/libarchive/tar.5
diff options
context:
space:
mode:
Diffstat (limited to 'Utilities/cmlibarchive/libarchive/tar.5')
-rw-r--r--Utilities/cmlibarchive/libarchive/tar.5352
1 files changed, 241 insertions, 111 deletions
diff --git a/Utilities/cmlibarchive/libarchive/tar.5 b/Utilities/cmlibarchive/libarchive/tar.5
index 853ddab..65875bd 100644
--- a/Utilities/cmlibarchive/libarchive/tar.5
+++ b/Utilities/cmlibarchive/libarchive/tar.5
@@ -22,10 +22,10 @@
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
-.\" $FreeBSD: src/lib/libarchive/tar.5,v 1.18 2008/05/26 17:00:23 kientzle Exp $
+.\" $FreeBSD: head/lib/libarchive/tar.5 201077 2009-12-28 01:50:23Z kientzle $
.\"
-.Dd April 19, 2009
-.Dt tar 5
+.Dd December 27, 2009
+.Dt TAR 5
.Os
.Sh NAME
.Nm tar
@@ -55,8 +55,11 @@ number of records with each I/O operation.
These
.Dq blocks
are always a multiple of the record size.
-The most common block size\(emand the maximum supported by historic
-implementations\(emis 10240 bytes or 20 records.
+The maximum block size supported by early
+implementations was 10240 bytes or 20 records.
+This is still the default for most implementations
+although block sizes of 1MiB (2048 records) or larger are
+commonly used with modern high-speed tape drives.
(Note: the terms
.Dq block
and
@@ -78,16 +81,16 @@ The header record for an old-style
archive consists of the following:
.Bd -literal -offset indent
struct header_old_tar {
- char name[100];
- char mode[8];
- char uid[8];
- char gid[8];
- char size[12];
- char mtime[12];
- char checksum[8];
- char linkflag[1];
- char linkname[100];
- char pad[255];
+ char name[100];
+ char mode[8];
+ char uid[8];
+ char gid[8];
+ char size[12];
+ char mtime[12];
+ char checksum[8];
+ char linkflag[1];
+ char linkname[100];
+ char pad[255];
};
.Ed
All unused bytes in the header record are filled with nulls.
@@ -168,9 +171,9 @@ These archives generally follow the POSIX ustar
format described below with the following variations:
.Bl -bullet -compact -width indent
.It
-The magic value is
-.Dq ustar\ \&
-(note the following space).
+The magic value consists of the five characters
+.Dq ustar
+followed by a space.
The version field contains a space character followed by a null.
.It
The numeric fields are generally filled with leading spaces
@@ -193,23 +196,23 @@ in the header.
It extends the historic format with new fields:
.Bd -literal -offset indent
struct header_posix_ustar {
- char name[100];
- char mode[8];
- char uid[8];
- char gid[8];
- char size[12];
- char mtime[12];
- char checksum[8];
- char typeflag[1];
- char linkname[100];
- char magic[6];
- char version[2];
- char uname[32];
- char gname[32];
- char devmajor[8];
- char devminor[8];
- char prefix[155];
- char pad[12];
+ char name[100];
+ char mode[8];
+ char uid[8];
+ char gid[8];
+ char size[12];
+ char mtime[12];
+ char checksum[8];
+ char typeflag[1];
+ char linkname[100];
+ char magic[6];
+ char version[2];
+ char uname[32];
+ char gname[32];
+ char devmajor[8];
+ char devminor[8];
+ char prefix[155];
+ char pad[12];
};
.Ed
.Bl -tag -width indent
@@ -272,16 +275,19 @@ when they are set and the corresponding names exist on
the system.
.It Va devmajor , Va devminor
Major and minor numbers for character device or block device entry.
-.It Va prefix
-First part of pathname.
+.It Va name , Va prefix
If the pathname is too long to fit in the 100 bytes provided by the standard
format, it can be split at any
.Pa /
-character with the first portion going here.
+character with the first portion going into the prefix field.
If the prefix field is not empty, the reader will prepend
the prefix value and a
.Pa /
character to the regular name field to obtain the full pathname.
+The standard does not require a trailing
+.Pa /
+character on directory names, though most implementations still
+include this for compatibility reasons.
.El
.Pp
Note that all unused bytes must be set to
@@ -308,7 +314,7 @@ unless they fill the entire field.
happens to have a
.Pa /
as the 156th character.)
-POSIX requires numeric fields to be zero-padded in the front, and allows
+POSIX requires numeric fields to be zero-padded in the front, and requires
them to be terminated with either space or
.Dv NUL
characters.
@@ -316,6 +322,39 @@ characters.
Currently, most tar implementations comply with the ustar
format, occasionally extending it by adding new fields to the
blank area at the end of the header record.
+.Ss Numeric Extensions
+There have been several attempts to extend the range of sizes
+or times supported by modifying how numbers are stored in the
+header.
+.Pp
+One obvious extension to increase the size of files is to
+eliminate the terminating characters from the various
+numeric fields.
+For example, the standard only allows the size field to contain
+11 octal digits, reserving the twelfth byte for a trailing
+NUL character.
+Allowing 12 octal digits allows file sizes up to 64 GB.
+.Pp
+Another extension, utilized by GNU tar, star, and other newer
+.Nm
+implementations, permits binary numbers in the standard numeric fields.
+This is flagged by setting the high bit of the first byte.
+The remainder of the field is treated as a signed twos-complement
+value.
+This permits 95-bit values for the length and time fields
+and 63-bit values for the uid, gid, and device numbers.
+In particular, this provides a consistent way to handle
+negative time values.
+GNU tar supports this extension for the
+length, mtime, ctime, and atime fields.
+Joerg Schilling's star program and the libarchive library support
+this extension for all numeric fields.
+Note that this extension is largely obsoleted by the extended
+attribute record provided by the pax interchange format.
+.Pp
+Another early GNU extension allowed base-64 values rather than octal.
+This extension was short-lived and is no longer supported by any
+implementation.
.Ss Pax Interchange Format
There are many attributes that cannot be portably stored in a
POSIX ustar archive.
@@ -359,6 +398,27 @@ A description of some common keys follows:
.It Cm atime , Cm ctime , Cm mtime
File access, inode change, and modification times.
These fields can be negative or include a decimal point and a fractional value.
+.It Cm hdrcharset
+The character set used by the pax extension values.
+By default, all textual values in the pax extended attributes
+are assumed to be in UTF-8, including pathnames, user names,
+and group names.
+In some cases, it is not possible to translate local
+conventions into UTF-8.
+If this key is present and the value is the six-character ASCII string
+.Dq BINARY ,
+then all textual values are assumed to be in a platform-dependent
+multi-byte encoding.
+Note that there are only two valid values for this key:
+.Dq BINARY
+or
+.Dq ISO-IR\ 10646\ 2000\ UTF-8 .
+No other values are permitted by the standard, and
+the latter value should generally not be used as it is the
+default when this key is not specified.
+In particular, this flag should not be used as a general
+mechanism to allow filenames to be stored in arbitrary
+encodings.
.It Cm uname , Cm uid , Cm gname , Cm gid
User name, group name, and numeric UID and GID values.
The user name and group name stored here are encoded in UTF8
@@ -402,6 +462,16 @@ Schilling's
.Cm SCHILY.*
extensions can store all of the data from
.Va struct stat .
+.It Cm LIBARCHIVE.*
+Vendor-specific attributes used by the
+.Nm libarchive
+library and programs that use it.
+.It Cm LIBARCHIVE.creationtime
+The time when the file was created.
+(This should not be confused with the POSIX
+.Dq ctime
+attribute, which refers to the time when the file
+metadata was last changed.)
.It Cm LIBARCHIVE.xattr. Ns Ar namespace Ns . Ns Ar key
Libarchive stores POSIX.1e-style extended attributes using
keys of this form.
@@ -479,33 +549,33 @@ more lenient POSIX-compliant readers can successfully extract most
GNU tar archives.
.Bd -literal -offset indent
struct header_gnu_tar {
- char name[100];
- char mode[8];
- char uid[8];
- char gid[8];
- char size[12];
- char mtime[12];
- char checksum[8];
- char typeflag[1];
- char linkname[100];
- char magic[6];
- char version[2];
- char uname[32];
- char gname[32];
- char devmajor[8];
- char devminor[8];
- char atime[12];
- char ctime[12];
- char offset[12];
- char longnames[4];
- char unused[1];
- struct {
- char offset[12];
- char numbytes[12];
- } sparse[4];
- char isextended[1];
- char realsize[12];
- char pad[17];
+ char name[100];
+ char mode[8];
+ char uid[8];
+ char gid[8];
+ char size[12];
+ char mtime[12];
+ char checksum[8];
+ char typeflag[1];
+ char linkname[100];
+ char magic[6];
+ char version[2];
+ char uname[32];
+ char gname[32];
+ char devmajor[8];
+ char devminor[8];
+ char atime[12];
+ char ctime[12];
+ char offset[12];
+ char longnames[4];
+ char unused[1];
+ struct {
+ char offset[12];
+ char numbytes[12];
+ } sparse[4];
+ char isextended[1];
+ char realsize[12];
+ char pad[17];
};
.Ed
.Bl -tag -width indent
@@ -629,12 +699,12 @@ Each such record contains information about as many as 21 additional
sparse blocks as shown here:
.Bd -literal -offset indent
struct gnu_sparse_header {
- struct {
- char offset[12];
- char numbytes[12];
- } sparse[21];
- char isextended[1];
- char padding[7];
+ struct {
+ char offset[12];
+ char numbytes[12];
+ } sparse[21];
+ char isextended[1];
+ char padding[7];
};
.Ed
.It Va realsize
@@ -653,8 +723,11 @@ GNU tar 1.14 (XXX check this XXX) and later will write
pax interchange format archives when you specify the
.Fl -posix
flag.
-This format uses custom keywords to store sparse file information.
-There have been three iterations of this support, referred to
+This format follows the pax interchange format closely,
+using some
+.Cm SCHILY
+tags and introducing new keywords to store sparse file information.
+There have been three iterations of the sparse file support, referred to
as
.Dq 0.0 ,
.Dq 0.1 ,
@@ -729,7 +802,7 @@ entry.
.It
An additional
.Cm A
-entry is used to store an ACL for the following regular entry.
+header is used to store an ACL for the following regular entry.
The body of this entry contains a seven-digit octal number
followed by a zero byte, followed by the
textual ACL description.
@@ -739,46 +812,95 @@ for POSIX.1e ACLs and 03000000 for NFSv4 ACLs.
.El
.Ss AIX Tar
XXX More details needed XXX
+.Pp
+AIX Tar uses a ustar-formatted header with the type
+.Cm A
+for storing coded ACL information.
+Unlike the Solaris format, AIX tar writes this header after the
+regular file body to which it applies.
+The pathname in this header is either
+.Cm NFS4
+or
+.Cm AIXC
+to indicate the type of ACL stored.
+The actual ACL is stored in platform-specific binary format.
.Ss Mac OS X Tar
The tar distributed with Apple's Mac OS X stores most regular files
-as two separate entries in the tar archive.
-The two entries have the same name except that the first
+as two separate files in the tar archive.
+The two files have the same name except that the first
one has
.Dq ._
-added to the beginning of the name.
-This first entry stores the
-.Dq resource fork
-with additional attributes for the file.
-The Mac OS X
-.Fn CopyFile
-API is used to separate a file on disk into separate
-resource and data streams and to reassemble those separate
-streams when the file is restored to disk.
-.Ss Other Extensions
-One obvious extension to increase the size of files is to
-eliminate the terminating characters from the various
-numeric fields.
-For example, the standard only allows the size field to contain
-11 octal digits, reserving the twelfth byte for a trailing
-NUL character.
-Allowing 12 octal digits allows file sizes up to 64 GB.
+prepended to the last path element.
+This special file stores an AppleDouble-encoded
+binary blob with additional metadata about the second file,
+including ACL, extended attributes, and resources.
+To recreate the original file on disk, each
+separate file can be extracted and the Mac OS X
+.Fn copyfile
+function can be used to unpack the separate
+metadata file and apply it to th regular file.
+Conversely, the same function provides a
+.Dq pack
+option to encode the extended metadata from
+a file into a separate file whose contents
+can then be put into a tar archive.
.Pp
-Another extension, utilized by GNU tar, star, and other newer
-.Nm
-implementations, permits binary numbers in the standard numeric fields.
-This is flagged by setting the high bit of the first byte.
-This permits 95-bit values for the length and time fields
-and 63-bit values for the uid, gid, and device numbers.
-GNU tar supports this extension for the
-length, mtime, ctime, and atime fields.
-Joerg Schilling's star program supports this extension for
-all numeric fields.
-Note that this extension is largely obsoleted by the extended attribute
-record provided by the pax interchange format.
-.Pp
-Another early GNU extension allowed base-64 values rather than octal.
-This extension was short-lived and is no longer supported by any
-implementation.
+Note that the Apple extended attributes interact
+badly with long filenames.
+Since each file is stored with the full name,
+a separate set of extensions needs to be included
+in the archive for each one, doubling the overhead
+required for files with long names.
+.Ss Summary of tar type codes
+The following list is a condensed summary of the type codes
+used in tar header records generated by different tar implementations.
+More details about specific implementations can be found above:
+.Bl -tag -compact -width XXX
+.It NUL
+Early tar programs stored a zero byte for regular files.
+.It Cm 0
+POSIX standard type code for a regular file.
+.It Cm 1
+POSIX standard type code for a hard link description.
+.It Cm 2
+POSIX standard type code for a symbolic link description.
+.It Cm 3
+POSIX standard type code for a character device node.
+.It Cm 4
+POSIX standard type code for a block device node.
+.It Cm 5
+POSIX standard type code for a directory.
+.It Cm 6
+POSIX standard type code for a FIFO.
+.It Cm 7
+POSIX reserved.
+.It Cm 7
+GNU tar used for pre-allocated files on some systems.
+.It Cm A
+Solaris tar ACL description stored prior to a regular file header.
+.It Cm A
+AIX tar ACL description stored after the file body.
+.It Cm D
+GNU tar directory dump.
+.It Cm K
+GNU tar long linkname for the following header.
+.It Cm L
+GNU tar long pathname for the following header.
+.It Cm M
+GNU tar multivolume marker, indicating the file is a continuation of a file from the previous volume.
+.It Cm N
+GNU tar long filename support. Deprecated.
+.It Cm S
+GNU tar sparse regular file.
+.It Cm V
+GNU tar tape/volume header name.
+.It Cm X
+Solaris tar general-purpose extension header.
+.It Cm g
+POSIX pax interchange format global extensions.
+.It Cm x
+POSIX pax interchange format per-file extensions.
+.El
.Sh SEE ALSO
.Xr ar 1 ,
.Xr pax 1 ,
@@ -809,9 +931,17 @@ John Gilmore's
.Nm pdtar
public-domain implementation (circa 1987) was highly influential
and formed the basis of
-.Nm GNU tar .
+.Nm GNU tar
+(circa 1988).
Joerg Shilling's
.Nm star
archiver is another open-source (GPL) archiver (originally developed
circa 1985) which features complete support for pax interchange
format.
+.Pp
+This documentation was written as part of the
+.Nm libarchive
+and
+.Nm bsdtar
+project by
+.An Tim Kientzle Aq kientzle@FreeBSD.org .