summaryrefslogtreecommitdiff
path: root/how_to_archive_with_posix_tar_cpio_and_pax.html
diff options
context:
space:
mode:
Diffstat (limited to 'how_to_archive_with_posix_tar_cpio_and_pax.html')
-rw-r--r--how_to_archive_with_posix_tar_cpio_and_pax.html238
1 files changed, 238 insertions, 0 deletions
diff --git a/how_to_archive_with_posix_tar_cpio_and_pax.html b/how_to_archive_with_posix_tar_cpio_and_pax.html
new file mode 100644
index 0000000..998afb7
--- /dev/null
+++ b/how_to_archive_with_posix_tar_cpio_and_pax.html
@@ -0,0 +1,238 @@
+<!doctype html>
+<html lang="en">
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width, initial-scale=1">
+<meta name="author" content="aki">
+<meta name="tags" content="posix, linux, tutorial, archiving, tar, cpio, pax">
+<link rel="icon" type="image/png" href="cylo.png">
+<link rel="stylesheet" type="text/css" href="style.css">
+
+<title>How To Archive With POSIX tar, cpio and pax</title>
+
+<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav>
+
+<article>
+<h1>How To Archive With POSIX tar, cpio and pax</h1>
+<p class="subtitle">Published on 2020-07-22 22:30:00+02:00
+<p>The usual answer to archive anything is <a href="https://www.gnu.org/software/tar/">tar</a>. As you may see I
+intentionally linked to the GNU Tar. If you are a *BSD user then you use some other implementation. Both of them follow
+and extend POSIX'es standard for tar utility. Or so you would think.
+<p>Right now there is no POSIX tar utility. It has been marked as legacy
+<a href="https://pubs.opengroup.org/onlinepubs/007908799/xcu/tar.html">already in 1997</a> and disappeared from the
+standard soon after. It's place took a behemoth called
+<a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html">pax</a>. The name gets even funnier when
+you consider the rationale and the size of this thing. But pax didn't came from just tar. There was one more influencer
+in here called <a href="https://pubs.opengroup.org/onlinepubs/007908799/xcu/cpio.html">cpio</a>. You may know this one
+if you ever tinkered with RPM packages or initramfs.
+<p>In other words we have three utilities on today's table: tar, cpio and pax. According to
+<a href="https://popcon.debian.org/by_inst">Debian's popularity contest</a> the frequency of each being installed is in
+the exact same order, with tar being at 8th place overall, cpio at 52nd, and pax at 6089th. I can't just talk about the
+least popular one, so I'll explain shortly how to use each of them in your usual Linux distribution while keeping in
+mind what POSIX had to tell us back in the day.
+
+<h2>tar</h2>
+<p>Like I've already mentioned tarballs are the most popular. Not only that, they are commonly described as the easiest
+to use, although the interface is something that you can find jokes about. All operations on tarballs are handled via
+single tar utility.</p>
+<img src="how_to_archive_with_posix_tar_cpio_and_pax-1.png" alt="box">
+<p>Let's go through three basic operations: create an archive, list out the content, and extract it. Tar expects to have
+first argument to match this regular expression: <code>[rxtuc][vwfblmo]*</code>. The first part is <em>function</em>,
+and the second is a <em>modifier</em>. I'll focus only on those necessary to accomplish before-mentioned tasks.
+<p>To create an archive you:</p>
+<pre>
+$ tar cf ../archive.tar a_file a_directory
+</pre>
+<p>This will create an archive that will be located in parent directory of current working directory, and will contain
+<code>a_file</code> and recursively <code>a_directory</code>. Let's map every part of the command for clarity:</p>
+<dl>
+ <dt><code>tar</code><dd>Call tar
+ <dt><code>c</code><dd>Create an archive
+ <dt><code>f</code><dd>Use first argument after <code>cf</code> as the path to the archive
+ <dt><code>../archive.tar</code><dd>Path to the archive (without <code>f</code> it would be treated as another file to
+ include in the archive)
+ <dt><code>a_file a_directory</code><dd>Files to include in the archives
+</dl>
+<p>Now that you have an archive, you can see it's content:</p>
+<pre>
+$ tar tf ../archive.tar
+a_file
+a_directory/
+a_directory/another_file
+</pre>
+<p>As you have probably guessed <code>t</code> function is used to write the names of files that are in the archive.
+<code>f</code> works exactly the same way: first argument after <code>tf</code> is meant to point to the archive file.
+<p>To extract everything from the archive you:</p>
+<pre>
+$ tar xf ../archive.tar
+</pre>
+<p>Or add more arguments to extract selected files:</p>
+<pre>
+$ tar xf ../archive.tar a_file
+</pre>
+<p>This one will extract only <code>a_file</code> from the archive.
+<p>That's pretty much it about tar. The are two more functions: <code>r</code> that adds new file to existing archive,
+and <code>u</code> that first tries to update the file in archive if it exists and if it doesn't then it adds it. Note,
+that the usual compression options are not available in POSIX, they are an extension.
+
+<h2>cpio</h2>
+<p>Heading off from the usual routes we encounter cpio. It's a more frequent sight than pax, but it still is quite niche
+compared to tar's omnipresence. Frankly, I like this one the most because of the way it handles input of file lists.
+Sadly, this also makes it slightly bothersome to use.
+<p>Now, now, cpio operates in three modes: <em>copy-out</em>, <em>copy-in</em> and <em>pass-through</em>. Our goals are
+still the same: to create an archive, list files inside, and extract it somewhere else and for that we'll only need the
+first two modes.
+<p>To create an archive, use the copy-out mode, as in: <em>copy</em> to the standard <em>out</em>put:</p>
+<pre>
+$ find a_file a_directory | cpio -o &gt;../archive.cpio
+</pre>
+<p>This instant you probably noticed that cpio doesn't accept files as arguments. In copy-out mode it expects list of
+files in standard input, and it will return the formatted archive through standard output. See a somehow step-by-step
+explanation:</p>
+<dl>
+ <dt><code>find a_file a_directory |</code><dd>List files, directories and their content from arguments and pipe the
+ output to the next command
+ <dt><code>cpio</code><dd>Call cpio (duh!)
+ <dt><code>-o</code><dd>Use copy-out mode
+ <dt><code>&gt;../archive.cpio</code><dd>Redirect standard output of cpio to a file
+</dl>
+<p>You now have an archive file called <code>archive.cpio</code> in parent directory. To see its content type in:</p>
+<pre>
+$ cpio -it &lt;../archive.cpio
+a_file
+a_directory
+a_directory/another_file
+1 block
+</pre>
+<p>Nice! What's left is extraction. You do it with copy-in mode like this:</p>
+<pre>
+$ cpio -i &lt;../archive.cpio
+1 block
+</pre>
+<p>Huh? What's that? Listing files and extracting both use copy-in mode? That's right. Like "copy-out" means "copy to
+standard output", "copy-in" can be understood as "copy from standard input". The <code>t</code> option prohibits any
+files to be written or created by cpio, nonetheless archive is read from standard input and then translated to list of
+files in standard output. Some extended implementations let you use <code>t</code> directly as sole option and imply the
+copy-in mode.
+<p>You can also use patterns when extracting to select files:</p>
+<pre>
+$ cpio -i a_file &lt;../archive.cpio
+1 block
+</pre>
+<p>You can copy nested files if you use <code>d</code> option:</p>
+<pre>
+$ cpio -id a_directory/another_file &lt;../archive.cpio
+1 block
+</pre>
+<p>This option tells cpio that it's allowed to create directories whenever it is necessary.</p>
+<img src="how_to_archive_with_posix_tar_cpio_and_pax-2.png" alt="pass-through">
+<p>Bonus! Pass-through mode can be used to copy files listed in standard input to specified directory. It doesn't create
+an archive at all.</p>
+<pre>
+$ ls ../destination
+$ ls
+a_directory a_file
+$ find a_file a_directory | cpio -p ../destination
+0 blocks
+$ ls ../destination
+a_directory a_file
+</pre>
+
+<h2>pax</h2>
+<p>Finally, at the destination! This one lives up to the name of this post as it's still part of POSIX. The fun part is
+that you probably don't even have it installed, but don't worry, I didn't have it until like two days ago. It truly
+feels like a compromise forced on you and your siblings by your parents. Jokes aside, I actually started to like it,
+bulky but kind of cute.
+<p>Anyway, let's see what this coffee machine can do for us; same goals as previously. This will be confusing, because
+this utility is a compromise, and so it supports both usage styles: tar-like and cpio-like.
+<p>To create an archive you can use either:</p>
+
+<pre>
+$ pax -wf ../archive.pax a_directory a_file
+$ find a_file a_directory | pax -wd &gt;../archive.pax
+$ find a_file a_directory | pax -wdf ../archive.pax
+</pre>
+
+<p>They are equivalent. You can mix the style as much as you want, as long as it doesn't become mess it's quite handy.
+As for what option does what:</p>
+
+<dl>
+ <dt><code>-w</code><dd>Indicates that pax will act in write mode (tar's <code>c</code> and cpio's <code>-o</code>)
+ <dt><code>f ../archive.pax</code><dd>Argument after <code>f</code> is the path to the archive; note that it behaves
+ slightly different compared to tar, it always takes next argument instead of first path that appears after flags. It
+ means you can't put any options between <code>-f</code> and the path.
+ <dt><code>a_directory a_file</code>
+ <dt><code>find a_file a_directory |</code><dd>Both of these accomplish the same goal of letting know <code>pax</code>
+ what files should be in archive. They are mutually exclusive! If there is at least one argument pointing to a file,
+ then standard input is not supposed to be read.
+ <dt><code>d</code><dd>This one is used to prevent recursively adding files that are in a directory, so that the
+ behaviour is the same as in cpio:
+<pre>
+$ find a_file a_directory | pax -wvf ../archive.pax
+a_directory
+<span style="color: red">a_directory/another_file
+a_directory/another_file</span>
+a_file
+pax: ustar vol 1, 4 files, 0 bytes read, 10240 bytes written.
+$ find a_directory a_file | pax -wv<span style="color: green">d</span>f ../archive.pax
+a_directory
+<span style="color: green">a_directory/another_file</span>
+a_file
+pax: ustar vol 1, 3 files, 0 bytes read, 10240 bytes written.
+</pre>
+</dl>
+
+<p>The <code>v</code> option is used to increase verbosity of the "error" output. You can find similar functionality in
+most of command line utilities, including tar and cpio.
+<p>To list files that are in archive you can also use both styles:</p>
+<pre>
+$ pax &lt;../archive.pax
+a_directory
+a_directory/another_file
+a_file
+$ pax -f ../archive.pax
+a_directory
+a_directory/another_file
+a_file
+</pre>
+<p>Yes, that's the default behaviour of pax and you don't need to specify any argument (in case of cpio-like style).
+Sweet, isn't it?
+<p>To extract the archive use one of:</p>
+<pre>
+$ pax -r &lt;../archive.pax
+$ pax -rf ../archive.pax
+</pre>
+<p>For selecting files to extract use the usual patterns:</p>
+<pre>
+$ pax -r a_file -f ../archive.pax
+$ pax -r a_directory/another_file &lt;../archive.pax
+</pre>
+<p>That's all of the most basic use case. There's more, for instance pax supports mode similar to the pass-through mode
+we already know from the cpio. But there is something more important to mention about pax. It's supposed to easily
+support various different formats.
+<p>POSIX tells that pax should support: pax, cpio and ustar formats. I installed GNU pax and it seems to support: ar,
+bcpio, cpio, sv4cpio, sc4crc, tar and ustar. The default format for my installation is ustar as you have probably
+noticed in verbose output in one of the examples above. Pax format is extension for ustar, that's most likely the reason
+it's usually omitted.
+<p>You can select format with <code>-x</code> option, for supported formats please refer to your manual. Also note that
+explicitly specifying format should be only needed when writing an archive. When reading pax can identify archive's
+format efficiently:</p>
+<pre>
+$ find a_file a_directory | cpio -o &gt;../archive.cpio
+$ pax -vf ../archive.cpio
+-rw-rw-r-- 1 ignore ignore 0 Jul 22 22:30 a_file
+drwxrwxr-x 2 ignore ignore 0 Jul 22 22:30 a_directory
+-rw-rw-r-- 1 ignore ignore 0 Jul 22 22:30 a_directory/another_file
+pax: bcpio vol 1, 3 files, 512 bytes read, 0 bytes written.
+</pre>
+
+<h2>Final thoughts</h2>
+<p>Now then, it's time to finally wrap it all up. There is nothing left to say but remember to always check your manual,
+all of those utilities have various implementations that are compliant to POSIX in various degrees. Don't be naive and
+don't get tricked by them. I find pax the most reliable of them as its "novelty" and the interface that was quite
+"modern" from the start resulted in decently compliant implementations. Moreover, it includes nice things one may know
+from both cpio and tar. Find a moment to check it out!
+<p>Let's pretend that <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ar.html">ar</a> doesn't exist.
+Thank you.</p>
+<img src="how_to_archive_with_posix_tar_cpio_and_pax-3.png" alt="boo!">
+</article>
+<script src="https://stats.ignore.pl/track.js"></script>