summaryrefslogtreecommitdiff
path: root/archiving_with_posix_utilities.html
diff options
context:
space:
mode:
Diffstat (limited to 'archiving_with_posix_utilities.html')
-rw-r--r--archiving_with_posix_utilities.html238
1 files changed, 0 insertions, 238 deletions
diff --git a/archiving_with_posix_utilities.html b/archiving_with_posix_utilities.html
deleted file mode 100644
index 17ce7bc..0000000
--- a/archiving_with_posix_utilities.html
+++ /dev/null
@@ -1,238 +0,0 @@
-<!doctype html>
-<html lang="en">
-<meta charset="utf-8">
-<meta name="viewport" content="width=device-width, initial-scale=1">
-<meta name="author" content="aki">
-<meta name="tags" content="posix, linux, tutorial, archiving, tar, cpio, pax">
-<link rel="icon" type="image/png" href="cylo.png">
-<link rel="stylesheet" type="text/css" href="style.css">
-
-<title>Archiving With POSIX Utilities</title>
-
-<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav>
-
-<article>
-<h1>Archiving With POSIX Utilities</h1>
-<p class="subtitle">Published on 2020-07-22 22:30:00+02:00
-<p>The usual answer is <a href="https://www.gnu.org/software/tar/">tar</a>. As you may see I intentionally linked to the
-GNU Tar. If you are a *BSD user then you use some other implementation. Both of them follow and extend POSIX'es standard
-for tar utility. Or so you would think.
-<p>Right now there is no POSIX tar utility. It has been marked as legacy
-<a href="https://pubs.opengroup.org/onlinepubs/007908799/xcu/tar.html">already in 1997</a> and disappeared from the
-standard soon after. It's place took a behemoth called
-<a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html">pax</a>. The name gets even funnier when
-you consider the rationale and the size of this thing. But pax didn't came from just tar. There was one more influencer
-in here called <a href="https://pubs.opengroup.org/onlinepubs/007908799/xcu/cpio.html">cpio</a>. You may know this one
-if you ever tinkered with RPM packages or initramfs.
-<p>In other words we have three utilities on today's table: tar, cpio and pax. According to
-<a href="https://popcon.debian.org/by_inst">Debian's popularity contest</a> the frequency of each being installed is in
-the exact same order, with tar being at 8th place overall, cpio at 52nd, and pax at 6089th. I can't just talk about the
-least popular one, so I'll explain shortly how to use each of them in your usual Linux distribution while keeping in
-mind what POSIX had to tell us back in the day.
-
-<h2>tar</h2>
-<p>Like I've already mentioned tarballs are the most popular. Not only that, they are commonly described as the easiest
-to use, although the interface is something that you can find jokes about. All operations on tarballs are handled via
-single tar utility.</p>
-<img src="archiving_with_posix_utilities-1.png" alt="box">
-<p>Let's go through three basic operations: create an archive, list out the content, and extract it. Tar expects to have
-first argument to match this regular expression: <code>[rxtuc][vwfblmo]*</code>. The first part is <em>function</em>,
-and the second is a <em>modifier</em>. I'll focus only on those necessary to accomplish before-mentioned tasks.
-<p>To create an archive you:</p>
-<pre>
-$ tar cf ../archive.tar a_file a_directory
-</pre>
-<p>This will create an archive that will be located in parent directory of current working directory, and will contain
-<code>a_file</code> and recursively <code>a_directory</code>. Let's map every part of the command for clarity:</p>
-<dl>
- <dt><code>tar</code><dd>Call tar
- <dt><code>c</code><dd>Create an archive
- <dt><code>f</code><dd>Use first argument after <code>cf</code> as the path to the archive
- <dt><code>../archive.tar</code><dd>Path to the archive (without <code>f</code> it would be treated as another file to
- include in the archive)
- <dt><code>a_file a_directory</code><dd>Files to include in the archives
-</dl>
-<p>Now that you have an archive, you can see it's content:</p>
-<pre>
-$ tar tf ../archive.tar
-a_file
-a_directory/
-a_directory/another_file
-</pre>
-<p>As you have probably guessed <code>t</code> function is used to write the names of files that are in the archive.
-<code>f</code> works exactly the same way: first argument after <code>tf</code> is meant to point to the archive file.
-<p>To extract everything from the archive you:</p>
-<pre>
-$ tar xf ../archive.tar
-</pre>
-<p>Or add more arguments to extract selected files:</p>
-<pre>
-$ tar xf ../archive.tar a_file
-</pre>
-<p>This one will extract only <code>a_file</code> from the archive.
-<p>That's pretty much it about tar. The are two more functions: <code>r</code> that adds new file to existing archive,
-and <code>u</code> that first tries to update the file in archive if it exists and if it doesn't then it adds it. Note,
-that the usual compression options are not available in POSIX, they are an extension.
-
-<h2>cpio</h2>
-<p>Heading off from the usual routes we encounter cpio. It's a more frequent sight than pax, but it still is quite niche
-compared to tar's omnipresence. Frankly, I like this one the most because of the way it handles input of file lists.
-Sadly, this also makes it slightly bothersome to use.
-<p>Now, now, cpio operates in three modes: <em>copy-out</em>, <em>copy-in</em> and <em>pass-through</em>. Our goals are
-still the same: to create an archive, list files inside, and extract it somewhere else and for that we'll only need the
-first two modes.
-<p>To create an archive, use the copy-out mode, as in: <em>copy</em> to the standard <em>out</em>put:</p>
-<pre>
-$ find a_file a_directory | cpio -o &gt;../archive.cpio
-</pre>
-<p>This instant you probably noticed that cpio doesn't accept files as arguments. In copy-out mode it expects list of
-files in standard input, and it will return the formatted archive through standard output. See a somehow step-by-step
-explanation:</p>
-<dl>
- <dt><code>find a_file a_directory |</code><dd>List files, directories and their content from arguments and pipe the
- output to the next command
- <dt><code>cpio</code><dd>Call cpio (duh!)
- <dt><code>-o</code><dd>Use copy-out mode
- <dt><code>&gt;../archive.cpio</code><dd>Redirect standard output of cpio to a file
-</dl>
-<p>You now have an archive file called <code>archive.cpio</code> in parent directory. To see its content type in:</p>
-<pre>
-$ cpio -it &lt;../archive.cpio
-a_file
-a_directory
-a_directory/another_file
-1 block
-</pre>
-<p>Nice! What's left is extraction. You do it with copy-in mode like this:</p>
-<pre>
-$ cpio -i &lt;../archive.cpio
-1 block
-</pre>
-<p>Huh? What's that? Listing files and extracting both use copy-in mode? That's right. Like "copy-out" means "copy to
-standard output", "copy-in" can be understood as "copy from standard input". The <code>t</code> option prohibits any
-files to be written or created by cpio, nonetheless archive is read from standard input and then translated to list of
-files in standard output. Some extended implementations let you use <code>t</code> directly as sole option and imply the
-copy-in mode.
-<p>You can also use patterns when extracting to select files:</p>
-<pre>
-$ cpio -i a_file &lt;../archive.cpio
-1 block
-</pre>
-<p>You can copy nested files if you use <code>d</code> option:</p>
-<pre>
-$ cpio -id a_directory/another_file &lt;../archive.cpio
-1 block
-</pre>
-<p>This option tells cpio that it's allowed to create directories whenever it is necessary.</p>
-<img src="archiving_with_posix_utilities-2.png" alt="pass-through">
-<p>Bonus! Pass-through mode can be used to copy files listed in standard input to specified directory. It doesn't create
-an archive at all.</p>
-<pre>
-$ ls ../destination
-$ ls
-a_directory a_file
-$ find a_file a_directory | cpio -p ../destination
-0 blocks
-$ ls ../destination
-a_directory a_file
-</pre>
-
-<h2>pax</h2>
-<p>Finally, at the destination! This one lives up to the name of this post as it's still part of POSIX. The fun part is
-that you probably don't even have it installed, but don't worry, I didn't have it until like two days ago. It truly
-feels like a compromise forced on you and your siblings by your parents. Jokes aside, I actually started to like it,
-bulky but kind of cute.
-<p>Anyway, let's see what this coffee machine can do for us; same goals as previously. This will be confusing, because
-this utility is a compromise, and so it supports both usage styles: tar-like and cpio-like.
-<p>To create an archive you can use either:</p>
-
-<pre>
-$ pax -wf ../archive.pax a_directory a_file
-$ find a_file a_directory | pax -wd &gt;../archive.pax
-$ find a_file a_directory | pax -wdf ../archive.pax
-</pre>
-
-<p>They are equivalent. You can mix the style as much as you want, as long as it doesn't become mess it's quite handy.
-As for what option does what:</p>
-
-<dl>
- <dt><code>-w</code><dd>Indicates that pax will act in write mode (tar's <code>c</code> and cpio's <code>-o</code>)
- <dt><code>f ../archive.pax</code><dd>Argument after <code>f</code> is the path to the archive; note that it behaves
- slightly different compared to tar, it always takes next argument instead of first path that appears after flags. It
- means you can't put any options between <code>-f</code> and the path.
- <dt><code>a_directory a_file</code>
- <dt><code>find a_file a_directory |</code><dd>Both of these accomplish the same goal of letting know <code>pax</code>
- what files should be in archive. They are mutually exclusive! If there is at least one argument pointing to a file,
- then standard input is not supposed to be read.
- <dt><code>d</code><dd>This one is used to prevent recursively adding files that are in a directory, so that the
- behaviour is the same as in cpio:
-<pre>
-$ find a_file a_directory | pax -wvf ../archive.pax
-a_directory
-<span style="color: red">a_directory/another_file
-a_directory/another_file</span>
-a_file
-pax: ustar vol 1, 4 files, 0 bytes read, 10240 bytes written.
-$ find a_directory a_file | pax -wv<span style="color: green">d</span>f ../archive.pax
-a_directory
-<span style="color: green">a_directory/another_file</span>
-a_file
-pax: ustar vol 1, 3 files, 0 bytes read, 10240 bytes written.
-</pre>
-</dl>
-
-<p>The <code>v</code> option is used to increase verbosity of the "error" output. You can find similar functionality in
-most of command line utilities, including tar and cpio.
-<p>To list files that are in archive you can also use both styles:</p>
-<pre>
-$ pax &lt;../archive.pax
-a_directory
-a_directory/another_file
-a_file
-$ pax -f ../archive.pax
-a_directory
-a_directory/another_file
-a_file
-</pre>
-<p>Yes, that's the default behaviour of pax and you don't need to specify any argument (in case of cpio-like style).
-Sweet, isn't it?
-<p>To extract the archive use one of:</p>
-<pre>
-$ pax -r &lt;../archive.pax
-$ pax -rf ../archive.pax
-</pre>
-<p>For selecting files to extract use the usual patterns:</p>
-<pre>
-$ pax -r a_file -f ../archive.pax
-$ pax -r a_directory/another_file &lt;../archive.pax
-</pre>
-<p>That's all of the most basic use case. There's more, for instance pax supports mode similar to the pass-through mode
-we already know from the cpio. But there is something more important to mention about pax. It's supposed to easily
-support various different formats.
-<p>POSIX tells that pax should support: pax, cpio and ustar formats. I installed GNU pax and it seems to support: ar,
-bcpio, cpio, sv4cpio, sc4crc, tar and ustar. The default format for my installation is ustar as you have probably
-noticed in verbose output in one of the examples above. Pax format is extension for ustar, that's most likely the reason
-it's usually omitted.
-<p>You can select format with <code>-x</code> option, for supported formats please refer to your manual. Also note that
-explicitly specifying format should be only needed when writing an archive. When reading pax can identify archive's
-format efficiently:</p>
-<pre>
-$ find a_file a_directory | cpio -o &gt;../archive.cpio
-$ pax -vf ../archive.cpio
--rw-rw-r-- 1 ignore ignore 0 Jul 22 22:30 a_file
-drwxrwxr-x 2 ignore ignore 0 Jul 22 22:30 a_directory
--rw-rw-r-- 1 ignore ignore 0 Jul 22 22:30 a_directory/another_file
-pax: bcpio vol 1, 3 files, 512 bytes read, 0 bytes written.
-</pre>
-
-<h2>Final thoughts</h2>
-<p>Now then, it's time to finally wrap it all up. There is nothing left to say but remember to always check your manual,
-all of those utilities have various implementations that are compliant to POSIX in various degrees. Don't be naive and
-don't get tricked by them. I find pax the most reliable of them as its "novelty" and the interface that was quite
-"modern" from the start resulted in decently compliant implementations. Moreover, it includes nice things one may know
-from both cpio and tar. Find a moment to check it out!
-<p>Let's pretend that <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ar.html">ar</a> doesn't exist.
-Thank you.</p>
-<img src="archiving_with_posix_utilities-3.png" alt="boo!">
-</article>
-<script src="https://stats.ignore.pl/track.js"></script>