From 8e1f3c9ebc0ccd132e3836f3d198415a15932877 Mon Sep 17 00:00:00 2001 From: Aki Date: Sun, 25 Jul 2021 19:46:25 +0200 Subject: Renamed guides to include "How To" in their names --- archiving_with_posix_utilities.html | 238 ------------------------------------ 1 file changed, 238 deletions(-) delete mode 100644 archiving_with_posix_utilities.html (limited to 'archiving_with_posix_utilities.html') diff --git a/archiving_with_posix_utilities.html b/archiving_with_posix_utilities.html deleted file mode 100644 index 17ce7bc..0000000 --- a/archiving_with_posix_utilities.html +++ /dev/null @@ -1,238 +0,0 @@ - - - - - - - - - -Archiving With POSIX Utilities - - - -
-

Archiving With POSIX Utilities

-

Published on 2020-07-22 22:30:00+02:00 -

The usual answer is tar. As you may see I intentionally linked to the -GNU Tar. If you are a *BSD user then you use some other implementation. Both of them follow and extend POSIX'es standard -for tar utility. Or so you would think. -

Right now there is no POSIX tar utility. It has been marked as legacy -already in 1997 and disappeared from the -standard soon after. It's place took a behemoth called -pax. The name gets even funnier when -you consider the rationale and the size of this thing. But pax didn't came from just tar. There was one more influencer -in here called cpio. You may know this one -if you ever tinkered with RPM packages or initramfs. -

In other words we have three utilities on today's table: tar, cpio and pax. According to -Debian's popularity contest the frequency of each being installed is in -the exact same order, with tar being at 8th place overall, cpio at 52nd, and pax at 6089th. I can't just talk about the -least popular one, so I'll explain shortly how to use each of them in your usual Linux distribution while keeping in -mind what POSIX had to tell us back in the day. - -

tar

-

Like I've already mentioned tarballs are the most popular. Not only that, they are commonly described as the easiest -to use, although the interface is something that you can find jokes about. All operations on tarballs are handled via -single tar utility.

-box -

Let's go through three basic operations: create an archive, list out the content, and extract it. Tar expects to have -first argument to match this regular expression: [rxtuc][vwfblmo]*. The first part is function, -and the second is a modifier. I'll focus only on those necessary to accomplish before-mentioned tasks. -

To create an archive you:

-
-$ tar cf ../archive.tar a_file a_directory
-
-

This will create an archive that will be located in parent directory of current working directory, and will contain -a_file and recursively a_directory. Let's map every part of the command for clarity:

-
-
tar
Call tar -
c
Create an archive -
f
Use first argument after cf as the path to the archive -
../archive.tar
Path to the archive (without f it would be treated as another file to - include in the archive) -
a_file a_directory
Files to include in the archives -
-

Now that you have an archive, you can see it's content:

-
-$ tar tf ../archive.tar
-a_file
-a_directory/
-a_directory/another_file
-
-

As you have probably guessed t function is used to write the names of files that are in the archive. -f works exactly the same way: first argument after tf is meant to point to the archive file. -

To extract everything from the archive you:

-
-$ tar xf ../archive.tar
-
-

Or add more arguments to extract selected files:

-
-$ tar xf ../archive.tar a_file
-
-

This one will extract only a_file from the archive. -

That's pretty much it about tar. The are two more functions: r that adds new file to existing archive, -and u that first tries to update the file in archive if it exists and if it doesn't then it adds it. Note, -that the usual compression options are not available in POSIX, they are an extension. - -

cpio

-

Heading off from the usual routes we encounter cpio. It's a more frequent sight than pax, but it still is quite niche -compared to tar's omnipresence. Frankly, I like this one the most because of the way it handles input of file lists. -Sadly, this also makes it slightly bothersome to use. -

Now, now, cpio operates in three modes: copy-out, copy-in and pass-through. Our goals are -still the same: to create an archive, list files inside, and extract it somewhere else and for that we'll only need the -first two modes. -

To create an archive, use the copy-out mode, as in: copy to the standard output:

-
-$ find a_file a_directory | cpio -o >../archive.cpio
-
-

This instant you probably noticed that cpio doesn't accept files as arguments. In copy-out mode it expects list of -files in standard input, and it will return the formatted archive through standard output. See a somehow step-by-step -explanation:

-
-
find a_file a_directory |
List files, directories and their content from arguments and pipe the - output to the next command -
cpio
Call cpio (duh!) -
-o
Use copy-out mode -
>../archive.cpio
Redirect standard output of cpio to a file -
-

You now have an archive file called archive.cpio in parent directory. To see its content type in:

-
-$ cpio -it <../archive.cpio
-a_file
-a_directory
-a_directory/another_file
-1 block
-
-

Nice! What's left is extraction. You do it with copy-in mode like this:

-
-$ cpio -i <../archive.cpio
-1 block
-
-

Huh? What's that? Listing files and extracting both use copy-in mode? That's right. Like "copy-out" means "copy to -standard output", "copy-in" can be understood as "copy from standard input". The t option prohibits any -files to be written or created by cpio, nonetheless archive is read from standard input and then translated to list of -files in standard output. Some extended implementations let you use t directly as sole option and imply the -copy-in mode. -

You can also use patterns when extracting to select files:

-
-$ cpio -i a_file <../archive.cpio
-1 block
-
-

You can copy nested files if you use d option:

-
-$ cpio -id a_directory/another_file <../archive.cpio
-1 block
-
-

This option tells cpio that it's allowed to create directories whenever it is necessary.

-pass-through -

Bonus! Pass-through mode can be used to copy files listed in standard input to specified directory. It doesn't create -an archive at all.

-
-$ ls ../destination
-$ ls
-a_directory  a_file
-$ find a_file a_directory | cpio -p ../destination
-0 blocks
-$ ls ../destination
-a_directory  a_file
-
- -

pax

-

Finally, at the destination! This one lives up to the name of this post as it's still part of POSIX. The fun part is -that you probably don't even have it installed, but don't worry, I didn't have it until like two days ago. It truly -feels like a compromise forced on you and your siblings by your parents. Jokes aside, I actually started to like it, -bulky but kind of cute. -

Anyway, let's see what this coffee machine can do for us; same goals as previously. This will be confusing, because -this utility is a compromise, and so it supports both usage styles: tar-like and cpio-like. -

To create an archive you can use either:

- -
-$ pax -wf ../archive.pax a_directory a_file
-$ find a_file a_directory | pax -wd >../archive.pax
-$ find a_file a_directory | pax -wdf ../archive.pax
-
- -

They are equivalent. You can mix the style as much as you want, as long as it doesn't become mess it's quite handy. -As for what option does what:

- -
-
-w
Indicates that pax will act in write mode (tar's c and cpio's -o) -
f ../archive.pax
Argument after f is the path to the archive; note that it behaves - slightly different compared to tar, it always takes next argument instead of first path that appears after flags. It - means you can't put any options between -f and the path. -
a_directory a_file -
find a_file a_directory |
Both of these accomplish the same goal of letting know pax - what files should be in archive. They are mutually exclusive! If there is at least one argument pointing to a file, - then standard input is not supposed to be read. -
d
This one is used to prevent recursively adding files that are in a directory, so that the - behaviour is the same as in cpio: -
-$ find a_file a_directory | pax -wvf ../archive.pax
-a_directory
-a_directory/another_file
-a_directory/another_file
-a_file
-pax: ustar vol 1, 4 files, 0 bytes read, 10240 bytes written.
-$ find a_directory a_file | pax -wvdf ../archive.pax
-a_directory
-a_directory/another_file
-a_file
-pax: ustar vol 1, 3 files, 0 bytes read, 10240 bytes written.
-
-
- -

The v option is used to increase verbosity of the "error" output. You can find similar functionality in -most of command line utilities, including tar and cpio. -

To list files that are in archive you can also use both styles:

-
-$ pax <../archive.pax
-a_directory
-a_directory/another_file
-a_file
-$ pax -f ../archive.pax
-a_directory
-a_directory/another_file
-a_file
-
-

Yes, that's the default behaviour of pax and you don't need to specify any argument (in case of cpio-like style). -Sweet, isn't it? -

To extract the archive use one of:

-
-$ pax -r <../archive.pax
-$ pax -rf ../archive.pax
-
-

For selecting files to extract use the usual patterns:

-
-$ pax -r a_file -f ../archive.pax
-$ pax -r a_directory/another_file <../archive.pax
-
-

That's all of the most basic use case. There's more, for instance pax supports mode similar to the pass-through mode -we already know from the cpio. But there is something more important to mention about pax. It's supposed to easily -support various different formats. -

POSIX tells that pax should support: pax, cpio and ustar formats. I installed GNU pax and it seems to support: ar, -bcpio, cpio, sv4cpio, sc4crc, tar and ustar. The default format for my installation is ustar as you have probably -noticed in verbose output in one of the examples above. Pax format is extension for ustar, that's most likely the reason -it's usually omitted. -

You can select format with -x option, for supported formats please refer to your manual. Also note that -explicitly specifying format should be only needed when writing an archive. When reading pax can identify archive's -format efficiently:

-
-$ find a_file a_directory | cpio -o >../archive.cpio
-$ pax -vf ../archive.cpio
--rw-rw-r--  1 ignore   ignore    0 Jul 22 22:30 a_file
-drwxrwxr-x  2 ignore   ignore    0 Jul 22 22:30 a_directory
--rw-rw-r--  1 ignore   ignore    0 Jul 22 22:30 a_directory/another_file
-pax: bcpio vol 1, 3 files, 512 bytes read, 0 bytes written.
-
- -

Final thoughts

-

Now then, it's time to finally wrap it all up. There is nothing left to say but remember to always check your manual, -all of those utilities have various implementations that are compliant to POSIX in various degrees. Don't be naive and -don't get tricked by them. I find pax the most reliable of them as its "novelty" and the interface that was quite -"modern" from the start resulted in decently compliant implementations. Moreover, it includes nice things one may know -from both cpio and tar. Find a moment to check it out! -

Let's pretend that ar doesn't exist. -Thank you.

-boo! -
- -- cgit v1.1