diff options
author | Aki <please@ignore.pl> | 2021-07-25 19:46:25 +0200 |
---|---|---|
committer | Aki <please@ignore.pl> | 2021-07-25 19:46:25 +0200 |
commit | 8e1f3c9ebc0ccd132e3836f3d198415a15932877 (patch) | |
tree | 624fa91d8b9cdc9353f5db958d2d456372442e26 /how_to_archive_with_posix_tar_cpio_and_pax.html | |
parent | c0b3870dde1d355de40515376ffd5bc87442e21f (diff) | |
download | ignore.pl-8e1f3c9ebc0ccd132e3836f3d198415a15932877.zip ignore.pl-8e1f3c9ebc0ccd132e3836f3d198415a15932877.tar.gz ignore.pl-8e1f3c9ebc0ccd132e3836f3d198415a15932877.tar.bz2 |
Renamed guides to include "How To" in their names
Diffstat (limited to 'how_to_archive_with_posix_tar_cpio_and_pax.html')
-rw-r--r-- | how_to_archive_with_posix_tar_cpio_and_pax.html | 238 |
1 files changed, 238 insertions, 0 deletions
diff --git a/how_to_archive_with_posix_tar_cpio_and_pax.html b/how_to_archive_with_posix_tar_cpio_and_pax.html new file mode 100644 index 0000000..998afb7 --- /dev/null +++ b/how_to_archive_with_posix_tar_cpio_and_pax.html @@ -0,0 +1,238 @@ +<!doctype html> +<html lang="en"> +<meta charset="utf-8"> +<meta name="viewport" content="width=device-width, initial-scale=1"> +<meta name="author" content="aki"> +<meta name="tags" content="posix, linux, tutorial, archiving, tar, cpio, pax"> +<link rel="icon" type="image/png" href="cylo.png"> +<link rel="stylesheet" type="text/css" href="style.css"> + +<title>How To Archive With POSIX tar, cpio and pax</title> + +<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav> + +<article> +<h1>How To Archive With POSIX tar, cpio and pax</h1> +<p class="subtitle">Published on 2020-07-22 22:30:00+02:00 +<p>The usual answer to archive anything is <a href="https://www.gnu.org/software/tar/">tar</a>. As you may see I +intentionally linked to the GNU Tar. If you are a *BSD user then you use some other implementation. Both of them follow +and extend POSIX'es standard for tar utility. Or so you would think. +<p>Right now there is no POSIX tar utility. It has been marked as legacy +<a href="https://pubs.opengroup.org/onlinepubs/007908799/xcu/tar.html">already in 1997</a> and disappeared from the +standard soon after. It's place took a behemoth called +<a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html">pax</a>. The name gets even funnier when +you consider the rationale and the size of this thing. But pax didn't came from just tar. There was one more influencer +in here called <a href="https://pubs.opengroup.org/onlinepubs/007908799/xcu/cpio.html">cpio</a>. You may know this one +if you ever tinkered with RPM packages or initramfs. +<p>In other words we have three utilities on today's table: tar, cpio and pax. According to +<a href="https://popcon.debian.org/by_inst">Debian's popularity contest</a> the frequency of each being installed is in +the exact same order, with tar being at 8th place overall, cpio at 52nd, and pax at 6089th. I can't just talk about the +least popular one, so I'll explain shortly how to use each of them in your usual Linux distribution while keeping in +mind what POSIX had to tell us back in the day. + +<h2>tar</h2> +<p>Like I've already mentioned tarballs are the most popular. Not only that, they are commonly described as the easiest +to use, although the interface is something that you can find jokes about. All operations on tarballs are handled via +single tar utility.</p> +<img src="how_to_archive_with_posix_tar_cpio_and_pax-1.png" alt="box"> +<p>Let's go through three basic operations: create an archive, list out the content, and extract it. Tar expects to have +first argument to match this regular expression: <code>[rxtuc][vwfblmo]*</code>. The first part is <em>function</em>, +and the second is a <em>modifier</em>. I'll focus only on those necessary to accomplish before-mentioned tasks. +<p>To create an archive you:</p> +<pre> +$ tar cf ../archive.tar a_file a_directory +</pre> +<p>This will create an archive that will be located in parent directory of current working directory, and will contain +<code>a_file</code> and recursively <code>a_directory</code>. Let's map every part of the command for clarity:</p> +<dl> + <dt><code>tar</code><dd>Call tar + <dt><code>c</code><dd>Create an archive + <dt><code>f</code><dd>Use first argument after <code>cf</code> as the path to the archive + <dt><code>../archive.tar</code><dd>Path to the archive (without <code>f</code> it would be treated as another file to + include in the archive) + <dt><code>a_file a_directory</code><dd>Files to include in the archives +</dl> +<p>Now that you have an archive, you can see it's content:</p> +<pre> +$ tar tf ../archive.tar +a_file +a_directory/ +a_directory/another_file +</pre> +<p>As you have probably guessed <code>t</code> function is used to write the names of files that are in the archive. +<code>f</code> works exactly the same way: first argument after <code>tf</code> is meant to point to the archive file. +<p>To extract everything from the archive you:</p> +<pre> +$ tar xf ../archive.tar +</pre> +<p>Or add more arguments to extract selected files:</p> +<pre> +$ tar xf ../archive.tar a_file +</pre> +<p>This one will extract only <code>a_file</code> from the archive. +<p>That's pretty much it about tar. The are two more functions: <code>r</code> that adds new file to existing archive, +and <code>u</code> that first tries to update the file in archive if it exists and if it doesn't then it adds it. Note, +that the usual compression options are not available in POSIX, they are an extension. + +<h2>cpio</h2> +<p>Heading off from the usual routes we encounter cpio. It's a more frequent sight than pax, but it still is quite niche +compared to tar's omnipresence. Frankly, I like this one the most because of the way it handles input of file lists. +Sadly, this also makes it slightly bothersome to use. +<p>Now, now, cpio operates in three modes: <em>copy-out</em>, <em>copy-in</em> and <em>pass-through</em>. Our goals are +still the same: to create an archive, list files inside, and extract it somewhere else and for that we'll only need the +first two modes. +<p>To create an archive, use the copy-out mode, as in: <em>copy</em> to the standard <em>out</em>put:</p> +<pre> +$ find a_file a_directory | cpio -o >../archive.cpio +</pre> +<p>This instant you probably noticed that cpio doesn't accept files as arguments. In copy-out mode it expects list of +files in standard input, and it will return the formatted archive through standard output. See a somehow step-by-step +explanation:</p> +<dl> + <dt><code>find a_file a_directory |</code><dd>List files, directories and their content from arguments and pipe the + output to the next command + <dt><code>cpio</code><dd>Call cpio (duh!) + <dt><code>-o</code><dd>Use copy-out mode + <dt><code>>../archive.cpio</code><dd>Redirect standard output of cpio to a file +</dl> +<p>You now have an archive file called <code>archive.cpio</code> in parent directory. To see its content type in:</p> +<pre> +$ cpio -it <../archive.cpio +a_file +a_directory +a_directory/another_file +1 block +</pre> +<p>Nice! What's left is extraction. You do it with copy-in mode like this:</p> +<pre> +$ cpio -i <../archive.cpio +1 block +</pre> +<p>Huh? What's that? Listing files and extracting both use copy-in mode? That's right. Like "copy-out" means "copy to +standard output", "copy-in" can be understood as "copy from standard input". The <code>t</code> option prohibits any +files to be written or created by cpio, nonetheless archive is read from standard input and then translated to list of +files in standard output. Some extended implementations let you use <code>t</code> directly as sole option and imply the +copy-in mode. +<p>You can also use patterns when extracting to select files:</p> +<pre> +$ cpio -i a_file <../archive.cpio +1 block +</pre> +<p>You can copy nested files if you use <code>d</code> option:</p> +<pre> +$ cpio -id a_directory/another_file <../archive.cpio +1 block +</pre> +<p>This option tells cpio that it's allowed to create directories whenever it is necessary.</p> +<img src="how_to_archive_with_posix_tar_cpio_and_pax-2.png" alt="pass-through"> +<p>Bonus! Pass-through mode can be used to copy files listed in standard input to specified directory. It doesn't create +an archive at all.</p> +<pre> +$ ls ../destination +$ ls +a_directory a_file +$ find a_file a_directory | cpio -p ../destination +0 blocks +$ ls ../destination +a_directory a_file +</pre> + +<h2>pax</h2> +<p>Finally, at the destination! This one lives up to the name of this post as it's still part of POSIX. The fun part is +that you probably don't even have it installed, but don't worry, I didn't have it until like two days ago. It truly +feels like a compromise forced on you and your siblings by your parents. Jokes aside, I actually started to like it, +bulky but kind of cute. +<p>Anyway, let's see what this coffee machine can do for us; same goals as previously. This will be confusing, because +this utility is a compromise, and so it supports both usage styles: tar-like and cpio-like. +<p>To create an archive you can use either:</p> + +<pre> +$ pax -wf ../archive.pax a_directory a_file +$ find a_file a_directory | pax -wd >../archive.pax +$ find a_file a_directory | pax -wdf ../archive.pax +</pre> + +<p>They are equivalent. You can mix the style as much as you want, as long as it doesn't become mess it's quite handy. +As for what option does what:</p> + +<dl> + <dt><code>-w</code><dd>Indicates that pax will act in write mode (tar's <code>c</code> and cpio's <code>-o</code>) + <dt><code>f ../archive.pax</code><dd>Argument after <code>f</code> is the path to the archive; note that it behaves + slightly different compared to tar, it always takes next argument instead of first path that appears after flags. It + means you can't put any options between <code>-f</code> and the path. + <dt><code>a_directory a_file</code> + <dt><code>find a_file a_directory |</code><dd>Both of these accomplish the same goal of letting know <code>pax</code> + what files should be in archive. They are mutually exclusive! If there is at least one argument pointing to a file, + then standard input is not supposed to be read. + <dt><code>d</code><dd>This one is used to prevent recursively adding files that are in a directory, so that the + behaviour is the same as in cpio: +<pre> +$ find a_file a_directory | pax -wvf ../archive.pax +a_directory +<span style="color: red">a_directory/another_file +a_directory/another_file</span> +a_file +pax: ustar vol 1, 4 files, 0 bytes read, 10240 bytes written. +$ find a_directory a_file | pax -wv<span style="color: green">d</span>f ../archive.pax +a_directory +<span style="color: green">a_directory/another_file</span> +a_file +pax: ustar vol 1, 3 files, 0 bytes read, 10240 bytes written. +</pre> +</dl> + +<p>The <code>v</code> option is used to increase verbosity of the "error" output. You can find similar functionality in +most of command line utilities, including tar and cpio. +<p>To list files that are in archive you can also use both styles:</p> +<pre> +$ pax <../archive.pax +a_directory +a_directory/another_file +a_file +$ pax -f ../archive.pax +a_directory +a_directory/another_file +a_file +</pre> +<p>Yes, that's the default behaviour of pax and you don't need to specify any argument (in case of cpio-like style). +Sweet, isn't it? +<p>To extract the archive use one of:</p> +<pre> +$ pax -r <../archive.pax +$ pax -rf ../archive.pax +</pre> +<p>For selecting files to extract use the usual patterns:</p> +<pre> +$ pax -r a_file -f ../archive.pax +$ pax -r a_directory/another_file <../archive.pax +</pre> +<p>That's all of the most basic use case. There's more, for instance pax supports mode similar to the pass-through mode +we already know from the cpio. But there is something more important to mention about pax. It's supposed to easily +support various different formats. +<p>POSIX tells that pax should support: pax, cpio and ustar formats. I installed GNU pax and it seems to support: ar, +bcpio, cpio, sv4cpio, sc4crc, tar and ustar. The default format for my installation is ustar as you have probably +noticed in verbose output in one of the examples above. Pax format is extension for ustar, that's most likely the reason +it's usually omitted. +<p>You can select format with <code>-x</code> option, for supported formats please refer to your manual. Also note that +explicitly specifying format should be only needed when writing an archive. When reading pax can identify archive's +format efficiently:</p> +<pre> +$ find a_file a_directory | cpio -o >../archive.cpio +$ pax -vf ../archive.cpio +-rw-rw-r-- 1 ignore ignore 0 Jul 22 22:30 a_file +drwxrwxr-x 2 ignore ignore 0 Jul 22 22:30 a_directory +-rw-rw-r-- 1 ignore ignore 0 Jul 22 22:30 a_directory/another_file +pax: bcpio vol 1, 3 files, 512 bytes read, 0 bytes written. +</pre> + +<h2>Final thoughts</h2> +<p>Now then, it's time to finally wrap it all up. There is nothing left to say but remember to always check your manual, +all of those utilities have various implementations that are compliant to POSIX in various degrees. Don't be naive and +don't get tricked by them. I find pax the most reliable of them as its "novelty" and the interface that was quite +"modern" from the start resulted in decently compliant implementations. Moreover, it includes nice things one may know +from both cpio and tar. Find a moment to check it out! +<p>Let's pretend that <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ar.html">ar</a> doesn't exist. +Thank you.</p> +<img src="how_to_archive_with_posix_tar_cpio_and_pax-3.png" alt="boo!"> +</article> +<script src="https://stats.ignore.pl/track.js"></script> |