diff options
Diffstat (limited to 'archiving_with_posix_utilities.html')
-rw-r--r-- | archiving_with_posix_utilities.html | 238 |
1 files changed, 0 insertions, 238 deletions
diff --git a/archiving_with_posix_utilities.html b/archiving_with_posix_utilities.html deleted file mode 100644 index 17ce7bc..0000000 --- a/archiving_with_posix_utilities.html +++ /dev/null @@ -1,238 +0,0 @@ -<!doctype html> -<html lang="en"> -<meta charset="utf-8"> -<meta name="viewport" content="width=device-width, initial-scale=1"> -<meta name="author" content="aki"> -<meta name="tags" content="posix, linux, tutorial, archiving, tar, cpio, pax"> -<link rel="icon" type="image/png" href="cylo.png"> -<link rel="stylesheet" type="text/css" href="style.css"> - -<title>Archiving With POSIX Utilities</title> - -<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav> - -<article> -<h1>Archiving With POSIX Utilities</h1> -<p class="subtitle">Published on 2020-07-22 22:30:00+02:00 -<p>The usual answer is <a href="https://www.gnu.org/software/tar/">tar</a>. As you may see I intentionally linked to the -GNU Tar. If you are a *BSD user then you use some other implementation. Both of them follow and extend POSIX'es standard -for tar utility. Or so you would think. -<p>Right now there is no POSIX tar utility. It has been marked as legacy -<a href="https://pubs.opengroup.org/onlinepubs/007908799/xcu/tar.html">already in 1997</a> and disappeared from the -standard soon after. It's place took a behemoth called -<a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html">pax</a>. The name gets even funnier when -you consider the rationale and the size of this thing. But pax didn't came from just tar. There was one more influencer -in here called <a href="https://pubs.opengroup.org/onlinepubs/007908799/xcu/cpio.html">cpio</a>. You may know this one -if you ever tinkered with RPM packages or initramfs. -<p>In other words we have three utilities on today's table: tar, cpio and pax. According to -<a href="https://popcon.debian.org/by_inst">Debian's popularity contest</a> the frequency of each being installed is in -the exact same order, with tar being at 8th place overall, cpio at 52nd, and pax at 6089th. I can't just talk about the -least popular one, so I'll explain shortly how to use each of them in your usual Linux distribution while keeping in -mind what POSIX had to tell us back in the day. - -<h2>tar</h2> -<p>Like I've already mentioned tarballs are the most popular. Not only that, they are commonly described as the easiest -to use, although the interface is something that you can find jokes about. All operations on tarballs are handled via -single tar utility.</p> -<img src="archiving_with_posix_utilities-1.png" alt="box"> -<p>Let's go through three basic operations: create an archive, list out the content, and extract it. Tar expects to have -first argument to match this regular expression: <code>[rxtuc][vwfblmo]*</code>. The first part is <em>function</em>, -and the second is a <em>modifier</em>. I'll focus only on those necessary to accomplish before-mentioned tasks. -<p>To create an archive you:</p> -<pre> -$ tar cf ../archive.tar a_file a_directory -</pre> -<p>This will create an archive that will be located in parent directory of current working directory, and will contain -<code>a_file</code> and recursively <code>a_directory</code>. Let's map every part of the command for clarity:</p> -<dl> - <dt><code>tar</code><dd>Call tar - <dt><code>c</code><dd>Create an archive - <dt><code>f</code><dd>Use first argument after <code>cf</code> as the path to the archive - <dt><code>../archive.tar</code><dd>Path to the archive (without <code>f</code> it would be treated as another file to - include in the archive) - <dt><code>a_file a_directory</code><dd>Files to include in the archives -</dl> -<p>Now that you have an archive, you can see it's content:</p> -<pre> -$ tar tf ../archive.tar -a_file -a_directory/ -a_directory/another_file -</pre> -<p>As you have probably guessed <code>t</code> function is used to write the names of files that are in the archive. -<code>f</code> works exactly the same way: first argument after <code>tf</code> is meant to point to the archive file. -<p>To extract everything from the archive you:</p> -<pre> -$ tar xf ../archive.tar -</pre> -<p>Or add more arguments to extract selected files:</p> -<pre> -$ tar xf ../archive.tar a_file -</pre> -<p>This one will extract only <code>a_file</code> from the archive. -<p>That's pretty much it about tar. The are two more functions: <code>r</code> that adds new file to existing archive, -and <code>u</code> that first tries to update the file in archive if it exists and if it doesn't then it adds it. Note, -that the usual compression options are not available in POSIX, they are an extension. - -<h2>cpio</h2> -<p>Heading off from the usual routes we encounter cpio. It's a more frequent sight than pax, but it still is quite niche -compared to tar's omnipresence. Frankly, I like this one the most because of the way it handles input of file lists. -Sadly, this also makes it slightly bothersome to use. -<p>Now, now, cpio operates in three modes: <em>copy-out</em>, <em>copy-in</em> and <em>pass-through</em>. Our goals are -still the same: to create an archive, list files inside, and extract it somewhere else and for that we'll only need the -first two modes. -<p>To create an archive, use the copy-out mode, as in: <em>copy</em> to the standard <em>out</em>put:</p> -<pre> -$ find a_file a_directory | cpio -o >../archive.cpio -</pre> -<p>This instant you probably noticed that cpio doesn't accept files as arguments. In copy-out mode it expects list of -files in standard input, and it will return the formatted archive through standard output. See a somehow step-by-step -explanation:</p> -<dl> - <dt><code>find a_file a_directory |</code><dd>List files, directories and their content from arguments and pipe the - output to the next command - <dt><code>cpio</code><dd>Call cpio (duh!) - <dt><code>-o</code><dd>Use copy-out mode - <dt><code>>../archive.cpio</code><dd>Redirect standard output of cpio to a file -</dl> -<p>You now have an archive file called <code>archive.cpio</code> in parent directory. To see its content type in:</p> -<pre> -$ cpio -it <../archive.cpio -a_file -a_directory -a_directory/another_file -1 block -</pre> -<p>Nice! What's left is extraction. You do it with copy-in mode like this:</p> -<pre> -$ cpio -i <../archive.cpio -1 block -</pre> -<p>Huh? What's that? Listing files and extracting both use copy-in mode? That's right. Like "copy-out" means "copy to -standard output", "copy-in" can be understood as "copy from standard input". The <code>t</code> option prohibits any -files to be written or created by cpio, nonetheless archive is read from standard input and then translated to list of -files in standard output. Some extended implementations let you use <code>t</code> directly as sole option and imply the -copy-in mode. -<p>You can also use patterns when extracting to select files:</p> -<pre> -$ cpio -i a_file <../archive.cpio -1 block -</pre> -<p>You can copy nested files if you use <code>d</code> option:</p> -<pre> -$ cpio -id a_directory/another_file <../archive.cpio -1 block -</pre> -<p>This option tells cpio that it's allowed to create directories whenever it is necessary.</p> -<img src="archiving_with_posix_utilities-2.png" alt="pass-through"> -<p>Bonus! Pass-through mode can be used to copy files listed in standard input to specified directory. It doesn't create -an archive at all.</p> -<pre> -$ ls ../destination -$ ls -a_directory a_file -$ find a_file a_directory | cpio -p ../destination -0 blocks -$ ls ../destination -a_directory a_file -</pre> - -<h2>pax</h2> -<p>Finally, at the destination! This one lives up to the name of this post as it's still part of POSIX. The fun part is -that you probably don't even have it installed, but don't worry, I didn't have it until like two days ago. It truly -feels like a compromise forced on you and your siblings by your parents. Jokes aside, I actually started to like it, -bulky but kind of cute. -<p>Anyway, let's see what this coffee machine can do for us; same goals as previously. This will be confusing, because -this utility is a compromise, and so it supports both usage styles: tar-like and cpio-like. -<p>To create an archive you can use either:</p> - -<pre> -$ pax -wf ../archive.pax a_directory a_file -$ find a_file a_directory | pax -wd >../archive.pax -$ find a_file a_directory | pax -wdf ../archive.pax -</pre> - -<p>They are equivalent. You can mix the style as much as you want, as long as it doesn't become mess it's quite handy. -As for what option does what:</p> - -<dl> - <dt><code>-w</code><dd>Indicates that pax will act in write mode (tar's <code>c</code> and cpio's <code>-o</code>) - <dt><code>f ../archive.pax</code><dd>Argument after <code>f</code> is the path to the archive; note that it behaves - slightly different compared to tar, it always takes next argument instead of first path that appears after flags. It - means you can't put any options between <code>-f</code> and the path. - <dt><code>a_directory a_file</code> - <dt><code>find a_file a_directory |</code><dd>Both of these accomplish the same goal of letting know <code>pax</code> - what files should be in archive. They are mutually exclusive! If there is at least one argument pointing to a file, - then standard input is not supposed to be read. - <dt><code>d</code><dd>This one is used to prevent recursively adding files that are in a directory, so that the - behaviour is the same as in cpio: -<pre> -$ find a_file a_directory | pax -wvf ../archive.pax -a_directory -<span style="color: red">a_directory/another_file -a_directory/another_file</span> -a_file -pax: ustar vol 1, 4 files, 0 bytes read, 10240 bytes written. -$ find a_directory a_file | pax -wv<span style="color: green">d</span>f ../archive.pax -a_directory -<span style="color: green">a_directory/another_file</span> -a_file -pax: ustar vol 1, 3 files, 0 bytes read, 10240 bytes written. -</pre> -</dl> - -<p>The <code>v</code> option is used to increase verbosity of the "error" output. You can find similar functionality in -most of command line utilities, including tar and cpio. -<p>To list files that are in archive you can also use both styles:</p> -<pre> -$ pax <../archive.pax -a_directory -a_directory/another_file -a_file -$ pax -f ../archive.pax -a_directory -a_directory/another_file -a_file -</pre> -<p>Yes, that's the default behaviour of pax and you don't need to specify any argument (in case of cpio-like style). -Sweet, isn't it? -<p>To extract the archive use one of:</p> -<pre> -$ pax -r <../archive.pax -$ pax -rf ../archive.pax -</pre> -<p>For selecting files to extract use the usual patterns:</p> -<pre> -$ pax -r a_file -f ../archive.pax -$ pax -r a_directory/another_file <../archive.pax -</pre> -<p>That's all of the most basic use case. There's more, for instance pax supports mode similar to the pass-through mode -we already know from the cpio. But there is something more important to mention about pax. It's supposed to easily -support various different formats. -<p>POSIX tells that pax should support: pax, cpio and ustar formats. I installed GNU pax and it seems to support: ar, -bcpio, cpio, sv4cpio, sc4crc, tar and ustar. The default format for my installation is ustar as you have probably -noticed in verbose output in one of the examples above. Pax format is extension for ustar, that's most likely the reason -it's usually omitted. -<p>You can select format with <code>-x</code> option, for supported formats please refer to your manual. Also note that -explicitly specifying format should be only needed when writing an archive. When reading pax can identify archive's -format efficiently:</p> -<pre> -$ find a_file a_directory | cpio -o >../archive.cpio -$ pax -vf ../archive.cpio --rw-rw-r-- 1 ignore ignore 0 Jul 22 22:30 a_file -drwxrwxr-x 2 ignore ignore 0 Jul 22 22:30 a_directory --rw-rw-r-- 1 ignore ignore 0 Jul 22 22:30 a_directory/another_file -pax: bcpio vol 1, 3 files, 512 bytes read, 0 bytes written. -</pre> - -<h2>Final thoughts</h2> -<p>Now then, it's time to finally wrap it all up. There is nothing left to say but remember to always check your manual, -all of those utilities have various implementations that are compliant to POSIX in various degrees. Don't be naive and -don't get tricked by them. I find pax the most reliable of them as its "novelty" and the interface that was quite -"modern" from the start resulted in decently compliant implementations. Moreover, it includes nice things one may know -from both cpio and tar. Find a moment to check it out! -<p>Let's pretend that <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ar.html">ar</a> doesn't exist. -Thank you.</p> -<img src="archiving_with_posix_utilities-3.png" alt="boo!"> -</article> -<script src="https://stats.ignore.pl/track.js"></script> |