From ad76e9b885c9b9692074cf5b8b880cb79f8a48e0 Mon Sep 17 00:00:00 2001 From: Aki Date: Sun, 25 Jul 2021 19:17:40 +0200 Subject: Initialized website as git repository --- archiving_with_posix_utilities.html | 238 ++++++++++++++++++++++++++++++++++++ 1 file changed, 238 insertions(+) create mode 100644 archiving_with_posix_utilities.html (limited to 'archiving_with_posix_utilities.html') diff --git a/archiving_with_posix_utilities.html b/archiving_with_posix_utilities.html new file mode 100644 index 0000000..17ce7bc --- /dev/null +++ b/archiving_with_posix_utilities.html @@ -0,0 +1,238 @@ + + + + + + + + + +Archiving With POSIX Utilities + + + +
+

Archiving With POSIX Utilities

+

Published on 2020-07-22 22:30:00+02:00 +

The usual answer is tar. As you may see I intentionally linked to the +GNU Tar. If you are a *BSD user then you use some other implementation. Both of them follow and extend POSIX'es standard +for tar utility. Or so you would think. +

Right now there is no POSIX tar utility. It has been marked as legacy +already in 1997 and disappeared from the +standard soon after. It's place took a behemoth called +pax. The name gets even funnier when +you consider the rationale and the size of this thing. But pax didn't came from just tar. There was one more influencer +in here called cpio. You may know this one +if you ever tinkered with RPM packages or initramfs. +

In other words we have three utilities on today's table: tar, cpio and pax. According to +Debian's popularity contest the frequency of each being installed is in +the exact same order, with tar being at 8th place overall, cpio at 52nd, and pax at 6089th. I can't just talk about the +least popular one, so I'll explain shortly how to use each of them in your usual Linux distribution while keeping in +mind what POSIX had to tell us back in the day. + +

tar

+

Like I've already mentioned tarballs are the most popular. Not only that, they are commonly described as the easiest +to use, although the interface is something that you can find jokes about. All operations on tarballs are handled via +single tar utility.

+box +

Let's go through three basic operations: create an archive, list out the content, and extract it. Tar expects to have +first argument to match this regular expression: [rxtuc][vwfblmo]*. The first part is function, +and the second is a modifier. I'll focus only on those necessary to accomplish before-mentioned tasks. +

To create an archive you:

+
+$ tar cf ../archive.tar a_file a_directory
+
+

This will create an archive that will be located in parent directory of current working directory, and will contain +a_file and recursively a_directory. Let's map every part of the command for clarity:

+
+
tar
Call tar +
c
Create an archive +
f
Use first argument after cf as the path to the archive +
../archive.tar
Path to the archive (without f it would be treated as another file to + include in the archive) +
a_file a_directory
Files to include in the archives +
+

Now that you have an archive, you can see it's content:

+
+$ tar tf ../archive.tar
+a_file
+a_directory/
+a_directory/another_file
+
+

As you have probably guessed t function is used to write the names of files that are in the archive. +f works exactly the same way: first argument after tf is meant to point to the archive file. +

To extract everything from the archive you:

+
+$ tar xf ../archive.tar
+
+

Or add more arguments to extract selected files:

+
+$ tar xf ../archive.tar a_file
+
+

This one will extract only a_file from the archive. +

That's pretty much it about tar. The are two more functions: r that adds new file to existing archive, +and u that first tries to update the file in archive if it exists and if it doesn't then it adds it. Note, +that the usual compression options are not available in POSIX, they are an extension. + +

cpio

+

Heading off from the usual routes we encounter cpio. It's a more frequent sight than pax, but it still is quite niche +compared to tar's omnipresence. Frankly, I like this one the most because of the way it handles input of file lists. +Sadly, this also makes it slightly bothersome to use. +

Now, now, cpio operates in three modes: copy-out, copy-in and pass-through. Our goals are +still the same: to create an archive, list files inside, and extract it somewhere else and for that we'll only need the +first two modes. +

To create an archive, use the copy-out mode, as in: copy to the standard output:

+
+$ find a_file a_directory | cpio -o >../archive.cpio
+
+

This instant you probably noticed that cpio doesn't accept files as arguments. In copy-out mode it expects list of +files in standard input, and it will return the formatted archive through standard output. See a somehow step-by-step +explanation:

+
+
find a_file a_directory |
List files, directories and their content from arguments and pipe the + output to the next command +
cpio
Call cpio (duh!) +
-o
Use copy-out mode +
>../archive.cpio
Redirect standard output of cpio to a file +
+

You now have an archive file called archive.cpio in parent directory. To see its content type in:

+
+$ cpio -it <../archive.cpio
+a_file
+a_directory
+a_directory/another_file
+1 block
+
+

Nice! What's left is extraction. You do it with copy-in mode like this:

+
+$ cpio -i <../archive.cpio
+1 block
+
+

Huh? What's that? Listing files and extracting both use copy-in mode? That's right. Like "copy-out" means "copy to +standard output", "copy-in" can be understood as "copy from standard input". The t option prohibits any +files to be written or created by cpio, nonetheless archive is read from standard input and then translated to list of +files in standard output. Some extended implementations let you use t directly as sole option and imply the +copy-in mode. +

You can also use patterns when extracting to select files:

+
+$ cpio -i a_file <../archive.cpio
+1 block
+
+

You can copy nested files if you use d option:

+
+$ cpio -id a_directory/another_file <../archive.cpio
+1 block
+
+

This option tells cpio that it's allowed to create directories whenever it is necessary.

+pass-through +

Bonus! Pass-through mode can be used to copy files listed in standard input to specified directory. It doesn't create +an archive at all.

+
+$ ls ../destination
+$ ls
+a_directory  a_file
+$ find a_file a_directory | cpio -p ../destination
+0 blocks
+$ ls ../destination
+a_directory  a_file
+
+ +

pax

+

Finally, at the destination! This one lives up to the name of this post as it's still part of POSIX. The fun part is +that you probably don't even have it installed, but don't worry, I didn't have it until like two days ago. It truly +feels like a compromise forced on you and your siblings by your parents. Jokes aside, I actually started to like it, +bulky but kind of cute. +

Anyway, let's see what this coffee machine can do for us; same goals as previously. This will be confusing, because +this utility is a compromise, and so it supports both usage styles: tar-like and cpio-like. +

To create an archive you can use either:

+ +
+$ pax -wf ../archive.pax a_directory a_file
+$ find a_file a_directory | pax -wd >../archive.pax
+$ find a_file a_directory | pax -wdf ../archive.pax
+
+ +

They are equivalent. You can mix the style as much as you want, as long as it doesn't become mess it's quite handy. +As for what option does what:

+ +
+
-w
Indicates that pax will act in write mode (tar's c and cpio's -o) +
f ../archive.pax
Argument after f is the path to the archive; note that it behaves + slightly different compared to tar, it always takes next argument instead of first path that appears after flags. It + means you can't put any options between -f and the path. +
a_directory a_file +
find a_file a_directory |
Both of these accomplish the same goal of letting know pax + what files should be in archive. They are mutually exclusive! If there is at least one argument pointing to a file, + then standard input is not supposed to be read. +
d
This one is used to prevent recursively adding files that are in a directory, so that the + behaviour is the same as in cpio: +
+$ find a_file a_directory | pax -wvf ../archive.pax
+a_directory
+a_directory/another_file
+a_directory/another_file
+a_file
+pax: ustar vol 1, 4 files, 0 bytes read, 10240 bytes written.
+$ find a_directory a_file | pax -wvdf ../archive.pax
+a_directory
+a_directory/another_file
+a_file
+pax: ustar vol 1, 3 files, 0 bytes read, 10240 bytes written.
+
+
+ +

The v option is used to increase verbosity of the "error" output. You can find similar functionality in +most of command line utilities, including tar and cpio. +

To list files that are in archive you can also use both styles:

+
+$ pax <../archive.pax
+a_directory
+a_directory/another_file
+a_file
+$ pax -f ../archive.pax
+a_directory
+a_directory/another_file
+a_file
+
+

Yes, that's the default behaviour of pax and you don't need to specify any argument (in case of cpio-like style). +Sweet, isn't it? +

To extract the archive use one of:

+
+$ pax -r <../archive.pax
+$ pax -rf ../archive.pax
+
+

For selecting files to extract use the usual patterns:

+
+$ pax -r a_file -f ../archive.pax
+$ pax -r a_directory/another_file <../archive.pax
+
+

That's all of the most basic use case. There's more, for instance pax supports mode similar to the pass-through mode +we already know from the cpio. But there is something more important to mention about pax. It's supposed to easily +support various different formats. +

POSIX tells that pax should support: pax, cpio and ustar formats. I installed GNU pax and it seems to support: ar, +bcpio, cpio, sv4cpio, sc4crc, tar and ustar. The default format for my installation is ustar as you have probably +noticed in verbose output in one of the examples above. Pax format is extension for ustar, that's most likely the reason +it's usually omitted. +

You can select format with -x option, for supported formats please refer to your manual. Also note that +explicitly specifying format should be only needed when writing an archive. When reading pax can identify archive's +format efficiently:

+
+$ find a_file a_directory | cpio -o >../archive.cpio
+$ pax -vf ../archive.cpio
+-rw-rw-r--  1 ignore   ignore    0 Jul 22 22:30 a_file
+drwxrwxr-x  2 ignore   ignore    0 Jul 22 22:30 a_directory
+-rw-rw-r--  1 ignore   ignore    0 Jul 22 22:30 a_directory/another_file
+pax: bcpio vol 1, 3 files, 512 bytes read, 0 bytes written.
+
+ +

Final thoughts

+

Now then, it's time to finally wrap it all up. There is nothing left to say but remember to always check your manual, +all of those utilities have various implementations that are compliant to POSIX in various degrees. Don't be naive and +don't get tricked by them. I find pax the most reliable of them as its "novelty" and the interface that was quite +"modern" from the start resulted in decently compliant implementations. Moreover, it includes nice things one may know +from both cpio and tar. Find a moment to check it out! +

Let's pretend that ar doesn't exist. +Thank you.

+boo! +
+ -- cgit v1.1