CS615 -- Aspects of System AdministrationKnow A Unix Command: tar(1)
NAME tar -- tape archiver SYNOPSIS tar [-]{crtux}[-014578befHhjklmOoPpqSvwXZz] [archive] [blocksize] [-C directory] [-s replstr] [-T file] [file ...] DESCRIPTION The tar command creates, adds files to, or extracts files from an archive file in ``tar'' format. A tar archive is often stored on a magnetic tape, but can be stored equally well on a floppy, CD-ROM, or in a regular disk file. Table of Contents
History and Command-Line Options ParsingAs the name suggests, tar(1) was originally created to archive files on magnetic tape. As per the manual page, the command first appeared in Version 7 AT&T UNIX. Since then, different implementations have been distributed, including the popular, BSD-licensed libarchive and GPL licensed GNU tar versions. The older UNIX commands had not yet standardized on using command-line options prefixed with a '-' (or, later, the --long-options). tar(1) (and a number of other commands) interpreted the first argument string as a list of single-letter options, and this behavior was retained for backwards compatibility. Many a sysadmin's muscle memory has been primed on typing, for example, "tar tvf file.tar", when specifying dash-prefixed options would work as well. For consistency, we will use '-flags' throughout this document. File Formattar(1) operates on data in a specific archive format, the Uniform Standard Tape ARchive or UStar format, described in the tar(5) manual page. Following its history of being used with tape drives, this format is a series of 512 byte records. Let's look at a tar(5) file: $ wget -q http://ftp.gnu.org/gnu/tar/tar-latest.tar.gz $ gzip -d tar-latest.tar.gz $ file tar-latest.tar tar-latest.tar: POSIX tar archive $ hexdump -c tar-latest.tar | more 0000000 t a r - 1 . 2 8 / \0 \0 \0 \0 \0 \0 \0 0000010 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000060 \0 \0 \0 \0 0 0 0 0 7 5 5 \0 0 0 0 1 0000070 7 5 0 \0 0 0 0 1 7 5 0 \0 0 0 0 0 0000080 0 0 0 0 0 0 0 \0 1 2 3 6 5 2 6 2 0000090 3 6 6 \0 0 1 2 2 3 3 \0 5 \0 \0 \0 00000a0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000100 \0 u s t a r \0 0 0 g r a y \0 \0 \0 0000110 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000120 \0 \0 \0 \0 \0 \0 \0 \0 \0 g r a y \0 \0 \0 0000130 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000140 \0 \0 \0 \0 \0 \0 \0 \0 \0 0 0 0 0 0 0 0 0000150 \0 0 0 0 0 0 0 0 \0 \0 \0 \0 \0 \0 \0 \0 0000160 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000200 t a r - 1 . 2 8 / a c i n c l u 0000210 d e . m 4 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000220 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000260 \0 \0 \0 \0 0 0 0 0 6 4 4 \0 0 0 0 1 0000270 7 5 0 \0 0 0 0 1 7 5 0 \0 0 0 0 0 --More-- Comparing the above output with the file format description from the tar(5) manual page, we can see that this archive contains a directory named tar-1.28, a file named tar-1.28/acinclude.m4, etc. etc. (Note: the file(1) command identified the correct file type by looking at the "magic" sequence at offset 257: u s t a r \0 0 0.) Common Invocationstar(1) has the following primary use cases:
tar(1) also frequently requires a filename to operate on; this may be the name of an actual file, a pathname for a tape device, or, per Unix convention, the string "-" to denote that it should operate on standard in. Different implementations may default to standard in, a default tape device such as /dev/nrst0, or the value of an environment variable if no file is specified via the -f flag. Finally, you will frequently want to enable verbose output to see tar(1)'s progress by adding the -v flag. Viewing ContentsOne of the most common use cases for tar(1) is to extract software distributed in such an archive file. Before doing so, however, it may be desirable to view the contents of the archive. An example of the most common invocation here would then be: Note how the entries listed here (file names, types, sizes, owner, etc.) reflect the data we saw in the raw hexdump(1) output above.$ tar -tvf tar-latest.tar | more drwxr-xr-x gray/gray 0 2014-07-27 16:45 tar-1.28/ -rw-r--r-- gray/gray 3126 2014-02-14 17:13 tar-1.28/acinclude.m4 -rw-r--r-- gray/gray 206633 2013-09-24 03:19 tar-1.28/ChangeLog.1 -rw-r--r-- gray/gray 86714 2014-07-27 16:34 tar-1.28/config.h.in -rw-r--r-- gray/gray 3359 2014-02-03 14:46 tar-1.28/Make.rules drwxr-xr-x gray/gray 0 2014-07-27 16:45 tar-1.28/doc/ -rw-r--r-- gray/gray 439 2014-02-10 12:42 tar-1.28/doc/value.texi ... Extracting ContentsExtracting contents using the same example file would then look like this: Note that we skipped the -v flag when extracting. Note also that tar(1) changed the ownership and permissions on the files it extracted to the current user and umask.$ tar -xf tar-latest.tar $ ls -l tar-1.28 total 2299 -rw------- 1 jschauma professor 79584 Sep 29 2013 ABOUT-NLS -rw------- 1 jschauma professor 601 Sep 24 2013 AUTHORS -rw------- 1 jschauma professor 35147 Sep 24 2013 COPYING -rw------- 1 jschauma professor 477038 Jul 27 2014 ChangeLog -rw------- 1 jschauma professor 206633 Sep 24 2013 ChangeLog.1 -rw------- 1 jschauma professor 15752 Mar 24 2014 INSTALL -rw------- 1 jschauma professor 3359 Feb 3 2014 Make.rules -rw------- 1 jschauma professor 1243 Jul 7 2014 Makefile.am -rw------- 1 jschauma professor 65796 Jul 27 2014 Makefile.in -rw------- 1 jschauma professor 57810 Jul 27 2014 NEWS -rw------- 1 jschauma professor 9868 Feb 10 2014 README -rw------- 1 jschauma professor 20118 Feb 14 2014 THANKS -rw------- 1 jschauma professor 2168 Feb 10 2014 TODO -rw------- 1 jschauma professor 3126 Feb 14 2014 acinclude.m4 ... Adding the -p flag allows tar(1) to preserve the permissions when extracting. Since setting/changing file ownership requires superuser privileges, the file owner will still remain the current user. (Different implementations may behave differently or require additional flags to (attempt to) retain the ownership as prescribed in the archive.) Extracting Partial ContentsSometimes you may wish to not extract all files from an archive. You can tell tar(1) which files you are looking for by specifying them on the command-line: $ tar -xvf tar-latest.tar tar-1.28/README tar-1.28/ChangeLog tar-1.28/README tar-1.28/ChangeLog $ You can specify wildcards to extract multiple files, although different implementations may require you to use different syntax. For example, using GNU tar(1), you could extract only the C source files like so: Note that the wildcards are single-quoted to prevent the current shell from interpreting the globs. If you had files ending in .c or .h in the current working directory and didn't quote the wildcards, your command would have failed:$ tar -xf tar-latest.tar --wildcards '*.[ch]' $ find tar-1.28 -name '*.[ch]' -print | wc -l 417 $ $ touch foo.c bar.h $ tar -xf tar-latest.tar --wildcards *.[ch] tar: bar.h: Not found in archive tar: foo.c: Not found in archive tar: Exiting with failure status due to previous errors $ Creating an ArchiveCreating an archive is trivial by specifying the name of the archive you wish to create and the files or directories to include: Note that tar(1) will descend into any directories given and will retain the resulting hierarchy. As a security precaution to prevent you from accidentally destroying files, it will remove pathname prefixes outside of the current working directory, both absolute and relative. That is, "/bin/ed" will become "bin/ed" in your archive; "../../some/path" would become "some/path". You can verify this by inspecting the contents of the archive you just created.$ tar -cvf archive.tar /bin ../../some/path file1 dir2/ tar: Removing leading `/' from member names /bin/ /bin/bzmore /bin/ed ... tar: Removing leading `../../' from member names ../../some/path file1 dir2/ dir2/file dir2/subdir/ ...
CompressionNowadays, tar(5) files are frequently compressed using, for example, Lempel-Ziv (compress(1)), LZ77 (gzip(1)), or Burrows-Wheeler (bzip2) coding, and most tar(1) implementations have support for compression built in. Most commonly, you can use the -z flag to enable gzip(1) compression handling, or the -j flag for bzip2(1) compression handling. Some implementations may also let you specify any arbitrary command to invoke to handle compression. Archives are now usually distributed using a filename ending in .tar.gz (or .tgz) to indicate gzip(1) compression, or .tar.bz2 (or .tbz) to indicate bzip2(1) compression. In other words, what used to be separate commands: or, more idiomatic:$ gzip -d tar-latest.tar.gz $ tar -tvf tar-latest.tar ... can usually be handled by using:$ gzip -d -c tar-latest.tar.gz | tar -tvf - and similarly for archive creation of extraction.$ tar -ztvf tar-latest.tar.gz To illustrate the use of another, external compression program, consider the use of xz(1): $ tar -cf archive.tar.xz --use-compress-program=xz directory $ file archive.tar.xz archive.tar.xz: XZ compressed data $ tar -tvf archive.tar.xz --use-compress-program=xz directory/ ... $ xz -c -d archive.tar.xz | tar -tvf - directory/ ... This One Weird TrickLike many a good Unix utility, tar(1) can read input from stdin and write to stdout, making it a flexible tool to use in a pipe. For example, to copy a directory hierarchy from one part of the filesystem to another, one might run: Here, we are effectively copying the contents of /usr/share to /backup/share. Note the use of the -C flag to change the working directory of tar(1) during archive creation and extraction.$ tar -cf - -C /usr share | tar -xf - -C /backup This concept becomes powerful when you realize that you can use this approach to copy files from one host to another. Consider, for example, two hosts hostA and hostB on which you have an account, but which can't talk to each other directly (for example due to firewall restrictions). From your current system, you could copy a file system hierarchy from one host to the other without any intermediary files by running: ssh hostA "tar -czf - dir" | ssh hostB "tar -xzf -" Similar Toolscpio(1)Another popular archiving tool is cpio(1), which nowadays finds it most widespread use by way of the rpm(1) package manager as well as part of the initramfs during the Linux boot process. pax(1)The original archive format did not include all the information about a file one might wish to retain, and POSIX.1 provided a definition for a new file format as implemented by the pax(1) utility. This tool is backwards compatible and generally read and write tar(1) archives but allows for additional features. However, tar(1) remains the most popular archive utility, even though on some systems it actually is implemented by way of pax-as-tar (i.e., the pax(1), when invoked as tar(1)). pax(1) can also read cpio(1) format. tar(1), pax(1), and cpio(1) equivalent invocations
See also
[Course Website] |