1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
|
<!doctype html>
<html lang="en">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="author" content="aki">
<meta name="tags" content="posix, linux, tutorial, archiving, tar, cpio, pax">
<link rel="icon" type="image/png" href="cylo.png">
<link rel="stylesheet" type="text/css" href="style.css">
<title>Archiving With POSIX Utilities</title>
<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav>
<article>
<h1>Archiving With POSIX Utilities</h1>
<p class="subtitle">Published on 2020-07-22 22:30:00+02:00
<p>The usual answer is <a href="https://www.gnu.org/software/tar/">tar</a>. As you may see I intentionally linked to the
GNU Tar. If you are a *BSD user then you use some other implementation. Both of them follow and extend POSIX'es standard
for tar utility. Or so you would think.
<p>Right now there is no POSIX tar utility. It has been marked as legacy
<a href="https://pubs.opengroup.org/onlinepubs/007908799/xcu/tar.html">already in 1997</a> and disappeared from the
standard soon after. It's place took a behemoth called
<a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html">pax</a>. The name gets even funnier when
you consider the rationale and the size of this thing. But pax didn't came from just tar. There was one more influencer
in here called <a href="https://pubs.opengroup.org/onlinepubs/007908799/xcu/cpio.html">cpio</a>. You may know this one
if you ever tinkered with RPM packages or initramfs.
<p>In other words we have three utilities on today's table: tar, cpio and pax. According to
<a href="https://popcon.debian.org/by_inst">Debian's popularity contest</a> the frequency of each being installed is in
the exact same order, with tar being at 8th place overall, cpio at 52nd, and pax at 6089th. I can't just talk about the
least popular one, so I'll explain shortly how to use each of them in your usual Linux distribution while keeping in
mind what POSIX had to tell us back in the day.
<h2>tar</h2>
<p>Like I've already mentioned tarballs are the most popular. Not only that, they are commonly described as the easiest
to use, although the interface is something that you can find jokes about. All operations on tarballs are handled via
single tar utility.</p>
<img src="archiving_with_posix_utilities-1.png" alt="box">
<p>Let's go through three basic operations: create an archive, list out the content, and extract it. Tar expects to have
first argument to match this regular expression: <code>[rxtuc][vwfblmo]*</code>. The first part is <em>function</em>,
and the second is a <em>modifier</em>. I'll focus only on those necessary to accomplish before-mentioned tasks.
<p>To create an archive you:</p>
<pre>
$ tar cf ../archive.tar a_file a_directory
</pre>
<p>This will create an archive that will be located in parent directory of current working directory, and will contain
<code>a_file</code> and recursively <code>a_directory</code>. Let's map every part of the command for clarity:</p>
<dl>
<dt><code>tar</code><dd>Call tar
<dt><code>c</code><dd>Create an archive
<dt><code>f</code><dd>Use first argument after <code>cf</code> as the path to the archive
<dt><code>../archive.tar</code><dd>Path to the archive (without <code>f</code> it would be treated as another file to
include in the archive)
<dt><code>a_file a_directory</code><dd>Files to include in the archives
</dl>
<p>Now that you have an archive, you can see it's content:</p>
<pre>
$ tar tf ../archive.tar
a_file
a_directory/
a_directory/another_file
</pre>
<p>As you have probably guessed <code>t</code> function is used to write the names of files that are in the archive.
<code>f</code> works exactly the same way: first argument after <code>tf</code> is meant to point to the archive file.
<p>To extract everything from the archive you:</p>
<pre>
$ tar xf ../archive.tar
</pre>
<p>Or add more arguments to extract selected files:</p>
<pre>
$ tar xf ../archive.tar a_file
</pre>
<p>This one will extract only <code>a_file</code> from the archive.
<p>That's pretty much it about tar. The are two more functions: <code>r</code> that adds new file to existing archive,
and <code>u</code> that first tries to update the file in archive if it exists and if it doesn't then it adds it. Note,
that the usual compression options are not available in POSIX, they are an extension.
<h2>cpio</h2>
<p>Heading off from the usual routes we encounter cpio. It's a more frequent sight than pax, but it still is quite niche
compared to tar's omnipresence. Frankly, I like this one the most because of the way it handles input of file lists.
Sadly, this also makes it slightly bothersome to use.
<p>Now, now, cpio operates in three modes: <em>copy-out</em>, <em>copy-in</em> and <em>pass-through</em>. Our goals are
still the same: to create an archive, list files inside, and extract it somewhere else and for that we'll only need the
first two modes.
<p>To create an archive, use the copy-out mode, as in: <em>copy</em> to the standard <em>out</em>put:</p>
<pre>
$ find a_file a_directory | cpio -o >../archive.cpio
</pre>
<p>This instant you probably noticed that cpio doesn't accept files as arguments. In copy-out mode it expects list of
files in standard input, and it will return the formatted archive through standard output. See a somehow step-by-step
explanation:</p>
<dl>
<dt><code>find a_file a_directory |</code><dd>List files, directories and their content from arguments and pipe the
output to the next command
<dt><code>cpio</code><dd>Call cpio (duh!)
<dt><code>-o</code><dd>Use copy-out mode
<dt><code>>../archive.cpio</code><dd>Redirect standard output of cpio to a file
</dl>
<p>You now have an archive file called <code>archive.cpio</code> in parent directory. To see its content type in:</p>
<pre>
$ cpio -it <../archive.cpio
a_file
a_directory
a_directory/another_file
1 block
</pre>
<p>Nice! What's left is extraction. You do it with copy-in mode like this:</p>
<pre>
$ cpio -i <../archive.cpio
1 block
</pre>
<p>Huh? What's that? Listing files and extracting both use copy-in mode? That's right. Like "copy-out" means "copy to
standard output", "copy-in" can be understood as "copy from standard input". The <code>t</code> option prohibits any
files to be written or created by cpio, nonetheless archive is read from standard input and then translated to list of
files in standard output. Some extended implementations let you use <code>t</code> directly as sole option and imply the
copy-in mode.
<p>You can also use patterns when extracting to select files:</p>
<pre>
$ cpio -i a_file <../archive.cpio
1 block
</pre>
<p>You can copy nested files if you use <code>d</code> option:</p>
<pre>
$ cpio -id a_directory/another_file <../archive.cpio
1 block
</pre>
<p>This option tells cpio that it's allowed to create directories whenever it is necessary.</p>
<img src="archiving_with_posix_utilities-2.png" alt="pass-through">
<p>Bonus! Pass-through mode can be used to copy files listed in standard input to specified directory. It doesn't create
an archive at all.</p>
<pre>
$ ls ../destination
$ ls
a_directory a_file
$ find a_file a_directory | cpio -p ../destination
0 blocks
$ ls ../destination
a_directory a_file
</pre>
<h2>pax</h2>
<p>Finally, at the destination! This one lives up to the name of this post as it's still part of POSIX. The fun part is
that you probably don't even have it installed, but don't worry, I didn't have it until like two days ago. It truly
feels like a compromise forced on you and your siblings by your parents. Jokes aside, I actually started to like it,
bulky but kind of cute.
<p>Anyway, let's see what this coffee machine can do for us; same goals as previously. This will be confusing, because
this utility is a compromise, and so it supports both usage styles: tar-like and cpio-like.
<p>To create an archive you can use either:</p>
<pre>
$ pax -wf ../archive.pax a_directory a_file
$ find a_file a_directory | pax -wd >../archive.pax
$ find a_file a_directory | pax -wdf ../archive.pax
</pre>
<p>They are equivalent. You can mix the style as much as you want, as long as it doesn't become mess it's quite handy.
As for what option does what:</p>
<dl>
<dt><code>-w</code><dd>Indicates that pax will act in write mode (tar's <code>c</code> and cpio's <code>-o</code>)
<dt><code>f ../archive.pax</code><dd>Argument after <code>f</code> is the path to the archive; note that it behaves
slightly different compared to tar, it always takes next argument instead of first path that appears after flags. It
means you can't put any options between <code>-f</code> and the path.
<dt><code>a_directory a_file</code>
<dt><code>find a_file a_directory |</code><dd>Both of these accomplish the same goal of letting know <code>pax</code>
what files should be in archive. They are mutually exclusive! If there is at least one argument pointing to a file,
then standard input is not supposed to be read.
<dt><code>d</code><dd>This one is used to prevent recursively adding files that are in a directory, so that the
behaviour is the same as in cpio:
<pre>
$ find a_file a_directory | pax -wvf ../archive.pax
a_directory
<span style="color: red">a_directory/another_file
a_directory/another_file</span>
a_file
pax: ustar vol 1, 4 files, 0 bytes read, 10240 bytes written.
$ find a_directory a_file | pax -wv<span style="color: green">d</span>f ../archive.pax
a_directory
<span style="color: green">a_directory/another_file</span>
a_file
pax: ustar vol 1, 3 files, 0 bytes read, 10240 bytes written.
</pre>
</dl>
<p>The <code>v</code> option is used to increase verbosity of the "error" output. You can find similar functionality in
most of command line utilities, including tar and cpio.
<p>To list files that are in archive you can also use both styles:</p>
<pre>
$ pax <../archive.pax
a_directory
a_directory/another_file
a_file
$ pax -f ../archive.pax
a_directory
a_directory/another_file
a_file
</pre>
<p>Yes, that's the default behaviour of pax and you don't need to specify any argument (in case of cpio-like style).
Sweet, isn't it?
<p>To extract the archive use one of:</p>
<pre>
$ pax -r <../archive.pax
$ pax -rf ../archive.pax
</pre>
<p>For selecting files to extract use the usual patterns:</p>
<pre>
$ pax -r a_file -f ../archive.pax
$ pax -r a_directory/another_file <../archive.pax
</pre>
<p>That's all of the most basic use case. There's more, for instance pax supports mode similar to the pass-through mode
we already know from the cpio. But there is something more important to mention about pax. It's supposed to easily
support various different formats.
<p>POSIX tells that pax should support: pax, cpio and ustar formats. I installed GNU pax and it seems to support: ar,
bcpio, cpio, sv4cpio, sc4crc, tar and ustar. The default format for my installation is ustar as you have probably
noticed in verbose output in one of the examples above. Pax format is extension for ustar, that's most likely the reason
it's usually omitted.
<p>You can select format with <code>-x</code> option, for supported formats please refer to your manual. Also note that
explicitly specifying format should be only needed when writing an archive. When reading pax can identify archive's
format efficiently:</p>
<pre>
$ find a_file a_directory | cpio -o >../archive.cpio
$ pax -vf ../archive.cpio
-rw-rw-r-- 1 ignore ignore 0 Jul 22 22:30 a_file
drwxrwxr-x 2 ignore ignore 0 Jul 22 22:30 a_directory
-rw-rw-r-- 1 ignore ignore 0 Jul 22 22:30 a_directory/another_file
pax: bcpio vol 1, 3 files, 512 bytes read, 0 bytes written.
</pre>
<h2>Final thoughts</h2>
<p>Now then, it's time to finally wrap it all up. There is nothing left to say but remember to always check your manual,
all of those utilities have various implementations that are compliant to POSIX in various degrees. Don't be naive and
don't get tricked by them. I find pax the most reliable of them as its "novelty" and the interface that was quite
"modern" from the start resulted in decently compliant implementations. Moreover, it includes nice things one may know
from both cpio and tar. Find a moment to check it out!
<p>Let's pretend that <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ar.html">ar</a> doesn't exist.
Thank you.</p>
<img src="archiving_with_posix_utilities-3.png" alt="boo!">
</article>
<script src="https://stats.ignore.pl/track.js"></script>
|