summaryrefslogtreecommitdiff
path: root/how_to_compress_files_in_posix.html
diff options
context:
space:
mode:
Diffstat (limited to 'how_to_compress_files_in_posix.html')
-rw-r--r--how_to_compress_files_in_posix.html117
1 files changed, 117 insertions, 0 deletions
diff --git a/how_to_compress_files_in_posix.html b/how_to_compress_files_in_posix.html
new file mode 100644
index 0000000..2ef743f
--- /dev/null
+++ b/how_to_compress_files_in_posix.html
@@ -0,0 +1,117 @@
+<!doctype html>
+<html lang="en">
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width, initial-scale=1">
+<meta name="author" content="aki">
+<meta name="tags" content="POSIX, compression, archiving, tutorial, guide, howto">
+<link rel="icon" type="image/png" href="cylo.png">
+<link rel="stylesheet" href="style.css">
+
+<title>How to Compress Files in POSIX</title>
+
+<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav>
+
+<article>
+<h1>How to Compress Files in POSIX</h1>
+<p class="subtitle">Published on 2021-08-14 19:48:00+02:00
+<p>This was quite an amusing one to read about. On one hand, the results kind of surprised me, but then, on the other...
+What exactly did I expect?
+<p>Anyway! How does one compress files in a POSIX-compliant system?
+<p>By the power of the Hinchliffe's rule, I say: you don't. <i>Wait, what kind of tutorial is this</i>?</p>
+
+
+<h2>The standard way</h2>
+<img src="how_to_compress_files_in_posix-1.png" alt="the way">
+<p>POSIX defines three utilities for compression but let's focus on two of them:
+<a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/compress.html">compress</a> and
+<a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/uncompress.html">uncompress</a>. They have quite
+descriptive names and are incredibly simple to use. Just give them names of the files to process and it will do the
+work. Result will be stored in a <code>*.Z</code> file:
+
+<pre>
+$ ls
+archive.tar
+$ <mark>compress archive.tar</mark>
+$ ls
+archive.tar.Z
+$ <mark>uncompress archive.tar.Z</mark>
+$ ls
+archive.tar
+</pre>
+
+<p>By default, the input file is replaced by the output. This can be avoided with <code>-c</code> option that redirects
+the output to the standard output:
+
+<pre>
+$ compress <mark>-c</mark> archive.tar <mark>&gt;archive.tar.Z</mark>
+$ uncompress <mark>-c</mark> archive.tar.Z <mark>&gt;archive.tar.bak</mark>
+$ ls
+archive.tar archive.tar.bak archive.tar.Z
+</pre>
+
+<p>And of course standard input can be used as well with the special filename: <code>-</code>.
+
+<p>So far it all looks good. That's because we're discussing here an imaginary implementation of these utilities the way
+they are described by the POSIX standard. However, that's not the real world.
+
+
+<h2>The actual way</h2>
+<p>If you are just like me and come from more Linux background, then prepare for disappointment. If you are coming from
+BSD, then I have great news for you: you can stop here, because your system actually implements the standard.</p>
+
+<img src="how_to_compress_files_in_posix-2.png" alt="the other way">
+
+<p>Instead of <b>compress</b> most Linux distributions come with <b>gzip</b>(1). Usually, it is the
+<a href="https://www.gnu.org/software/gzip/">GNU Gzip</a>. The reason for that is of course legal work and patenting
+issues. The full reasoning is covered in <a href="https://www.gnu.org/philosophy/gif.html">No GIF Files</a>. However,
+this is all in past because the LZW patents already expired.
+<p>Let's put the story and reasons aside. What we have is an inability to conform to a standard due to legal reasons and
+this inability became a standard. And so in Linux systems you will end up using <b>gzip</b> or <b>xz</b>(1), or
+<b>bzip2</b>(1), or really anything else:
+
+<pre>
+$ ls
+archive.tar
+$ <mark>gzip archive.tar</mark>
+$ ls
+archive.tar.gz
+$ <mark>gzip -d archive.tar</mark>
+$ ls
+archive.tar
+</pre>
+
+<p>You can replace <code>gzip</code> with any of the mentioned utilities - they have very similar interfaces. Not only
+that, they are also partially compatible with the interface for <b>compress</b> defined by the POSIX standard. Each of
+them has an additional <i>un</i> command (e.g., <code>gunzip</code>, <code>unxz</code>) that can be used instead of
+<code>-d</code> option. If you feel adventurous you could try symlinking them (especially <b>gzip</b> since it
+implements LZ77).
+<p>This brings a question regarding formats compatibility, but it's a comparison big enough to have its own article.
+<p>In the end, if you want to use POSIX <b>compress</b> in GNU/Linux - you don't. Unless...
+
+
+<h2>The other way</h2>
+<p>Unless you use <a href="https://github.com/vapier/ncompress">ncompress</a> which has both <b>compress</b>(1) and
+<b>uncompress</b>(1). Even more, it inherits directly from the original implementation. But there is one thing you need
+to know about it.
+<p>It's bad. Yes, a detailed comparison of compression algorithms is yet another huge and interesting topic, but this
+particular case really can be summed up in: it's bad. It's OK with text. At least, it implements the POSIX standard and
+is most likely available in your distribution's repository.
+<p>Here are results of compression of an arbitrary tarball with majority of source code and some resources, all done
+with default options:
+
+<table>
+<tr><td>Source<td>22M
+<tr><td><b>bzip2</b><td>5.4M
+<tr><td><b>gzip</b><td>5.9M
+<tr><td><b>xz</b><td>2.6M
+<tr><td><b>ncompress</b><td>9.1M
+</table>
+
+<p>In other words, it's not bad, but it's staying behind (more) modern programs. This also could be an additional reason
+for why it is not used or even installed by default in most Linux distributions. I didn't check BSD's implementation,
+but I expect rather good results.
+<p>The main takeaway from this article is that if you plan to write anything that is portable across POSIX-compliant or
+semi-compliant systems, then you need to give compressing slightly more attention.
+
+</article>
+<script src="https://stats.ignore.pl/track.js"></script> \ No newline at end of file