diff options
Diffstat (limited to 'how_to_compress_files_in_posix.html')
-rw-r--r-- | how_to_compress_files_in_posix.html | 117 |
1 files changed, 117 insertions, 0 deletions
diff --git a/how_to_compress_files_in_posix.html b/how_to_compress_files_in_posix.html new file mode 100644 index 0000000..2ef743f --- /dev/null +++ b/how_to_compress_files_in_posix.html @@ -0,0 +1,117 @@ +<!doctype html> +<html lang="en"> +<meta charset="utf-8"> +<meta name="viewport" content="width=device-width, initial-scale=1"> +<meta name="author" content="aki"> +<meta name="tags" content="POSIX, compression, archiving, tutorial, guide, howto"> +<link rel="icon" type="image/png" href="cylo.png"> +<link rel="stylesheet" href="style.css"> + +<title>How to Compress Files in POSIX</title> + +<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav> + +<article> +<h1>How to Compress Files in POSIX</h1> +<p class="subtitle">Published on 2021-08-14 19:48:00+02:00 +<p>This was quite an amusing one to read about. On one hand, the results kind of surprised me, but then, on the other... +What exactly did I expect? +<p>Anyway! How does one compress files in a POSIX-compliant system? +<p>By the power of the Hinchliffe's rule, I say: you don't. <i>Wait, what kind of tutorial is this</i>?</p> + + +<h2>The standard way</h2> +<img src="how_to_compress_files_in_posix-1.png" alt="the way"> +<p>POSIX defines three utilities for compression but let's focus on two of them: +<a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/compress.html">compress</a> and +<a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/uncompress.html">uncompress</a>. They have quite +descriptive names and are incredibly simple to use. Just give them names of the files to process and it will do the +work. Result will be stored in a <code>*.Z</code> file: + +<pre> +$ ls +archive.tar +$ <mark>compress archive.tar</mark> +$ ls +archive.tar.Z +$ <mark>uncompress archive.tar.Z</mark> +$ ls +archive.tar +</pre> + +<p>By default, the input file is replaced by the output. This can be avoided with <code>-c</code> option that redirects +the output to the standard output: + +<pre> +$ compress <mark>-c</mark> archive.tar <mark>>archive.tar.Z</mark> +$ uncompress <mark>-c</mark> archive.tar.Z <mark>>archive.tar.bak</mark> +$ ls +archive.tar archive.tar.bak archive.tar.Z +</pre> + +<p>And of course standard input can be used as well with the special filename: <code>-</code>. + +<p>So far it all looks good. That's because we're discussing here an imaginary implementation of these utilities the way +they are described by the POSIX standard. However, that's not the real world. + + +<h2>The actual way</h2> +<p>If you are just like me and come from more Linux background, then prepare for disappointment. If you are coming from +BSD, then I have great news for you: you can stop here, because your system actually implements the standard.</p> + +<img src="how_to_compress_files_in_posix-2.png" alt="the other way"> + +<p>Instead of <b>compress</b> most Linux distributions come with <b>gzip</b>(1). Usually, it is the +<a href="https://www.gnu.org/software/gzip/">GNU Gzip</a>. The reason for that is of course legal work and patenting +issues. The full reasoning is covered in <a href="https://www.gnu.org/philosophy/gif.html">No GIF Files</a>. However, +this is all in past because the LZW patents already expired. +<p>Let's put the story and reasons aside. What we have is an inability to conform to a standard due to legal reasons and +this inability became a standard. And so in Linux systems you will end up using <b>gzip</b> or <b>xz</b>(1), or +<b>bzip2</b>(1), or really anything else: + +<pre> +$ ls +archive.tar +$ <mark>gzip archive.tar</mark> +$ ls +archive.tar.gz +$ <mark>gzip -d archive.tar</mark> +$ ls +archive.tar +</pre> + +<p>You can replace <code>gzip</code> with any of the mentioned utilities - they have very similar interfaces. Not only +that, they are also partially compatible with the interface for <b>compress</b> defined by the POSIX standard. Each of +them has an additional <i>un</i> command (e.g., <code>gunzip</code>, <code>unxz</code>) that can be used instead of +<code>-d</code> option. If you feel adventurous you could try symlinking them (especially <b>gzip</b> since it +implements LZ77). +<p>This brings a question regarding formats compatibility, but it's a comparison big enough to have its own article. +<p>In the end, if you want to use POSIX <b>compress</b> in GNU/Linux - you don't. Unless... + + +<h2>The other way</h2> +<p>Unless you use <a href="https://github.com/vapier/ncompress">ncompress</a> which has both <b>compress</b>(1) and +<b>uncompress</b>(1). Even more, it inherits directly from the original implementation. But there is one thing you need +to know about it. +<p>It's bad. Yes, a detailed comparison of compression algorithms is yet another huge and interesting topic, but this +particular case really can be summed up in: it's bad. It's OK with text. At least, it implements the POSIX standard and +is most likely available in your distribution's repository. +<p>Here are results of compression of an arbitrary tarball with majority of source code and some resources, all done +with default options: + +<table> +<tr><td>Source<td>22M +<tr><td><b>bzip2</b><td>5.4M +<tr><td><b>gzip</b><td>5.9M +<tr><td><b>xz</b><td>2.6M +<tr><td><b>ncompress</b><td>9.1M +</table> + +<p>In other words, it's not bad, but it's staying behind (more) modern programs. This also could be an additional reason +for why it is not used or even installed by default in most Linux distributions. I didn't check BSD's implementation, +but I expect rather good results. +<p>The main takeaway from this article is that if you plan to write anything that is portable across POSIX-compliant or +semi-compliant systems, then you need to give compressing slightly more attention. + +</article> +<script src="https://stats.ignore.pl/track.js"></script>
\ No newline at end of file |