Published Deconstructing Web Browsers

author: Aki <please@ignore.pl> 2021-07-25 19:24:21 +0200
committer: Aki <please@ignore.pl> 2021-07-25 19:24:21 +0200
commit: c0b3870dde1d355de40515376ffd5bc87442e21f (patch)
tree: 4743de8f455f13bb737ab46f06978da9a10ecf9d
parent: ad76e9b885c9b9692074cf5b8b880cb79f8a48e0 (diff)
download: ignore.pl-c0b3870dde1d355de40515376ffd5bc87442e21f.zip
ignore.pl-c0b3870dde1d355de40515376ffd5bc87442e21f.tar.gz
ignore.pl-c0b3870dde1d355de40515376ffd5bc87442e21f.tar.bz2
9 files changed, 141 insertions, 183 deletions
diff --git a/deconstructing_web_browsers-1.png b/deconstructing_web_browsers-1.png
new file mode 100644
index 0000000..e4b5d59
--- /dev/null
+++ b/deconstructing_web_browsers-1.png
diff --git a/deconstructing_web_browsers-2.png b/deconstructing_web_browsers-2.png
new file mode 100644
index 0000000..0fc3dc6
--- /dev/null
+++ b/deconstructing_web_browsers-2.png
diff --git a/deconstructing_web_browsers.html b/deconstructing_web_browsers.html
new file mode 100644
index 0000000..855ee39
--- /dev/null
+++ b/deconstructing_web_browsers.html
@@ -0,0 +1,134 @@
+<!doctype html>
+<html lang="en">
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width, initial-scale=1">
+<meta name="author" content="aki">
+<meta name="tags" content="web, browser">
+<link rel="icon" type="image/png" href="cylo.png">
+<link rel="stylesheet" href="style.css">
+
+<title>Deconstructing Web Browsers</title>
+
+<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav>
+
+<article>
+<h1>Deconstructing Web Browsers</h1>
+<p class="subtitle">Published on 2021-07-25 14:53:00+02:00
+<p>Welcome to one of my little experiments! The theme is simple: to deconstruct a web browser and create several
+utilities with distinct and clear responsibilities in its stead. This is not your regular blog post. Honestly, I'm not
+yet sure what it is, but I'll figure it out at some point. Expect this page to be updated and extend. (Well, just like
+my regular posts, for some reason, I need rethink it one more time...)
+
+<h2>Motivation and History</h2>
+<p>The idea started to sprout in my mind few years ago. In its early stages it wasn't really directed at web browsers
+but instead <a href="markdown_is_bad_for_you.html">it focused on markdown</a>. After giving it some more thinking, it
+changed target to <a href="web_browsers_are_no_more.html">web browsers</a>. Now, it also started to draw from Unix
+philosophy and my general aversion towards IDEs, or rather that this kind of modularity started to be visible as the
+real core of the motivation that drives this idea. I never really touched or explored it yet. I didn't try to discredit
+it either. Hopefully, once I reach that point, it will stand its ground.</p>
+
+<img src="deconstructing_web_browsers-1.png" alt="scroll with history">
+
+<p>Last year, I explored this idea a bit in a two-part text series and within a small project called
+<a href="https://git.ignore.pl/browse/">browse</a>. I naively split the responsibilities between programs and had some
+fun writing very simple scripts that did the work. And they did the work surprisingly good, but functionality
+constraints had to be extremely strict. Recently, I came back to it, read my own stuff, looked at my own code, and I
+still could relate to it. Instead of removing everything like I sometimes do, I decided to develop
+<a href="https://git.ignore.pl/markdown/">a new utility</a> and write this new summary and project status.
+
+<h2>Experimenting From a Terminal</h2>
+<p>Rather than jumping into design or development work straight away, let's see how far can we get, while using only
+shell and some usual utilities you can find in every shed. To access a webpage, one could potentially eat it raw:
+
+<pre>
+$ curl -sL https://ignore.pl/ | less
+...
+</pre>
+
+<p>Now, that's raw! With a page like this one, it's possible. I write them by hand and comply to my own rules that make
+it possible for the reader to consume them as plain text. However, it's not very useful considering how astoundingly
+obfuscated modern HTML pages can get.
+
+<p>It's not only extremely complex HTML hierarchies that we need to deal with. Another great opponents are web
+applications that pretend to be webpages. Separating those two will prove itself to be useful. Not only that, it will
+also open us to new possibilities. Consider a dead simple script that acts similarly to regular opener:
+
+<pre>
+#!/bin/sh
+TMP=$(mktemp -p /dev/shm) &&
+	{ TYPE=$(curl -sLw "%{content_type}\n" $@ -o "$TMP") &&
+		case "$TYPE" in
+			application/pdf) zathura "$TMP";;
+			image/*) sxiv "$TMP";;
+			text/html*) html_viewer "$TMP";;
+			text/markdown*) markdown_viewer "$TMP";;
+			text/*) less "$TMP";;
+			*) echo "$TMP";;
+		esac }
+rm -f "$TMP"
+</pre>
+
+<p>You use it like this:
+
+<pre>
+$ ./script https://ignore.pl/
+</pre>
+
+<p>It shows the requested content using a program that's selected based on its mime type. Here, the difference between
+webpage and web application is blurred. Hypothetically, using mime or some other means we could do a switch cases like
+these:
+
+<pre>
+web-application/html+js) fork_of_chromium_or_something "$TMP";;
+web-application/lua) lua_gui_sandbox "$TMP";;
+</pre>
+
+<p>The ability to support multiple competing frameworks that are meant to run seamlessly loading sandboxed applications
+(so, web applications) is really making me interested.
+
+<p>That's not the only thing though. As you can see, in this example markdown and HTML are completely separated.
+Markdown is no longer a format that's supposed to generate HTML but instead it becomes a stand-alone hypertext format.
+Because the content requests are meant to run through such demultiplexer the hyperlinks can lead from one hypertext
+format to another. <b>This allows new formats and new ways of expression to grow and compete</b>, hopefully breathing
+some life into an ecosystem that's currently driven by monolithic giants.</p>
+
+<img src="deconstructing_web_browsers-2.png" alt="bacteria or something, dunno">
+
+<h2>Browser That’s Part of Your Environment</h2>
+<p>Of course, a single script like the example above is not the way to go, but it's a good start as it gives insight
+into data flow and responsibilities. At first, just by looking at it, I decided to naively distinguish four components:
+
+<dl>
+<dt>navigator
+<dd>Takes address of the request from user and forwards it to a <i>protocol daemon</i>. Retrieved content is then pushed
+to <i>opener</i>.
+<dt>protocol daemon
+<dd>Acquires and caches data using a single protocol e.g., HTTP.
+<dt>opener
+<dd>Chooses viewers based on content type.
+<dt>viewer
+<dd>Presents content to user and allows to interact with it.
+</dl>
+
+<p>I found it to be a decent starting point and played around with it getting encouraging results. All predicted
+obstacles made their appearances and thanks to working prototypes shortcomings of each role were shown. In the second
+iteration I wanted to divide <i>navigator</i> into several stand-alone parts but in the end I never committed to it.
+
+<p>Based on the description above, it doesn't seem as if <i>navigator</i> would require such division. Actually,
+<a href="https://git.ignore.pl/browse/tree/browse?id=9dca05999d355deb225938ba4f57858ca27ca130">current
+implementation</a> doesn't clearly show such need either. The only hints are <code>-f</code> option in <i>navigator</i>
+and <i>opener</i>, and direct calls to <i>protocol daemon</i> by <i>viewers</i> to retrieve auxiliary content (e.g.,
+stylesheet or embedded image). Meaning <i>navigator</i> is hiding a plumbing-capable <i>request resolver</i> below the
+porcelain interface that's dedicated to user.
+
+<p>More than that, <i>navigator</i> may also be hiding functionality meant to support browsing history that I didn't
+explore yet at all. Combining it with graphic interfaces, sessions management or tabs are all question marks.
+
+<p>Obviously, responsibilities of the components is not the only matter to think about. Interfaces in every form are
+also important. I'm talking here: communication between the components of the browser, interchangeability, communication
+between the browser and the rest of the environment it runs in, and integration with graphical user interfaces and
+window managers.
+
+<p>For now, I plan to split <i>navigator</i> and look into a equivalent of an address bar.
+</article>
+<script src="https://stats.ignore.pl/track.js"></script>
diff --git a/plumbing_your_own_browser-1.png b/graveyard_of_the_drawings-10.png
index bbfebec..bbfebec 100644
--- a/plumbing_your_own_browser-1.png
+++ b/graveyard_of_the_drawings-10.png
diff --git a/integrating_browser_into_your_environment-1.png b/graveyard_of_the_drawings-11.png
index 4c2d87a..4c2d87a 100644
--- a/integrating_browser_into_your_environment-1.png
+++ b/graveyard_of_the_drawings-11.png
diff --git a/graveyard_of_the_drawings.html b/graveyard_of_the_drawings.html
index 2e67da6..7c3ac65 100644
--- a/graveyard_of_the_drawings.html
+++ b/graveyard_of_the_drawings.html
@@ -13,7 +13,7 @@
 
 <article>
 <h1>Graveyard of the Drawings</h1>
-<p class="subtitle">Last modified on 2021-03-19 19:53+01:00
+<p class="subtitle">Last modified on 2021-07-25 19:21+02:00
 <p>Here are the drawings I made for articles that I decided to remove. No context, no nothing. Just images. Despite
 the style, I still think that it'd be a little bit of waste to just remove them along the texts and reusing them in
 different articles is just lazy.</p>
@@ -26,4 +26,6 @@ different articles is just lazy.</p>
 <img src="graveyard_of_the_drawings-7.png">
 <img src="graveyard_of_the_drawings-8.png">
 <img src="graveyard_of_the_drawings-9.png">
+<img src="graveyard_of_the_drawings-10.png">
+<img src="graveyard_of_the_drawings-11.png">
 </article>
diff --git a/index.html b/index.html
index a89d9a2..5a9b5e1 100644
--- a/index.html
+++ b/index.html
@@ -31,8 +31,7 @@ completely discard the concept of a keyboard.
 		<li>Rebuilding Web Browsing
 			<ol>
 				<li><a href="web_browsers_are_no_more.html">Web Browsers Are No More</a>
-				<li><a href="plumbing_your_own_browser.html">Plumbing Your Own Browser</a>
-				<li><a href="integrating_browser_into_your_environment.html">Integrating Browser Into Your Environment</a>
+				<li><a href="deconstructing_web_browsers.html">Deconstructing Web Browsers<a>
 			</ol>
 		<li><a href="of_privacy_and_traffic_tracking.html">Of Privacy and Traffic Tracking</a>
 		<li><a href="how_to_write_a_minimal_html5_document.html">How to Write a Minimal HTML5 Document</a>
@@ -69,6 +68,9 @@ completely discard the concept of a keyboard.
 <section id="news">
 <h2>News</h2>
 <p><strong><time>2021-07-25</time></strong>
+Published <a href="deconstructing_web_browsers.html">Deconstructing Web Browsers<a>, summary of now-removed <i>Plumbing
+Your Own Browser</i> and <i>Integrating Browser into Your Environment</i>.
+<p><strong><time>2021-07-25</time></strong>
 Initialized website as git repository. Let's see if it will be useful.
 <p><strong><time>2021-07-25</time></strong>
 Rewritten parts of and updated <a href="web_browsers_are_no_more.html">We Browsers Are No More</a>.
diff --git a/integrating_browser_into_your_environment.html b/integrating_browser_into_your_environment.html
deleted file mode 100644
index e67bfea..0000000
--- a/integrating_browser_into_your_environment.html
+++ /dev/null
@@ -1,81 +0,0 @@
-<!doctype html>
-<html lang="en">
-<meta charset="utf-8">
-<meta name="viewport" content="width=device-width, initial-scale=1">
-<meta name="author" content="aki">
-<meta name="tags" content="web, browser, unix philosophy">
-<link rel="icon" type="image/png" href="cylo.png">
-<link rel="stylesheet" href="style.css">
-
-<title>Integrating Browser Into Your Environment</title>
-
-<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav>
-
-<article>
-<h1>Integrating Browser Into Your Environment</h1>
-<p class="subtitle">Published on 2020-08-12 23:15:00+02:00
-<p>Not so long ago I've finally started to play around with a little idea I had when I was writing
-<a href="markdown_is_bad_for_you.html">the rant about markdown</a>. That little idea was to split web browser into
-possibly several smaller utilities with a distinct responsibilities. In other words, to apply Unix-ish philosophy in a
-web browser. I've touched this idea in <a href="web_browsers_are_no_more.html">Web browsers are no more</a> and then
-did some initial tinkering in <a href="plumbing_your_own_browser.html">Plumbing your own browser</a>. Now time has come
-to draw conclusions. Think of this post as a direct update to the plumbing one.
-<p>I don't like IDEs. I have hand-crafted environments that I "live in" when I'm working on any of my computers. Window
-manager that I tinkered to my liking, my preferred utilities, my text editor, my shortcuts. Whole operating system is
-configured with one thing kept in mind: it belongs to me. IDEs invade this personal space of mine. And so do web
-browsers. Of course, you can configure both web browsers and IDEs to some extent. You can even integrate them closer to
-your normal environment, but in my experience sooner or later you'll run into limitations. Or you will end up with IDE
-consuming your entire operating system (hello, emacs!). I didn't like that.
-<p>Thanks to the amount of alternatives I can happily avoid using IDEs. I can't say that about browsers. Moreover modern
-browsers are enormous and hermetic. Usually the only utility you have to interface with them is <code>browse</code>
-which in turn is usually just a symbolic link to <code>xdg-open</code>. Not only that, but they only to open links in
-their rendering engine and may allow to save a file, so that user can use it once he leaves the browser alone.
-<p>Because of that, and because of other reasons I described in before-mentioned articles, I decided to try if splitting
-browser into smaller utilities is a viable option, and just play around this idea.
-<p>For now, I've split it into four parts, but I can see more utilities emerging:
-<dl>
-<dt>request solver
-<dd>Previously, I referred to it as "browse" utility. But the way I have "browse" implemented now implies more than just
-one responsibility. On the other, the request solver is meant to only oversee a request. It means it has all the pieces
-of information and passes them to utilities in order to complete the request. It interacts with most of other programs
-and may interact with user.<br>
-It's one of the most important parts of this system. Due to nature of more verbose media like websites it should support
-more than just "get this URI and show it in a view". For instance, it should be able to allow user (or view) to open the
-resource in currently used active window or just retrieve files without opening them (in case of e.g. stylesheets). I
-believe that there is enough room in here to separate even more utilities.
-<dt>protocol demulitplexer
-<dd>This one is also a part of the "browse" as of now, just because at this stage it can be a simple switch case or even
-non-existent, assuming I plan to support only one protocol (e.g. http). One could pass this responsibility to the file
-system, if protocols were to be implemented at this level (the Hurd-ish way).
-<dt>protocol daemon
-<dd>Not really a daemon (but it can be one!). Retrieves and points to data needed by the request solver.
-<dt>opener/view demultiplexer
-<dd>Your usual <code>xdg-open</code> clone. A more verbose switch case that opens the resources in appropriate views.
-<dt>view/view engine
-<dd>Displays the retrieved resource to a user. It's aware of its content and may request secondary files through request
-solver (again, e.g. stylesheet or an image). Displays hyperlinks and redirects them to request solver. It's almost
-completely agnostic to how they should be handled. It may suggest request solver to open the link in current view, if
-the resource type is supported and the view is desired to handle this type of resource.
-</dl>
-<p>Now then, implementation currently have request solver and protocol demultiplexer in one utility called "browse". I
-see quite a lot of opportunities to split the request solver a little bit more, or at least move some of the tasks to
-already existing programs. Nonetheless, they're way more separated than most modern browsers.</p>
-<img src="integrating_browser_into_your_environment-1.png" alt="demux, I really like this word">
-<p>The biggest pain in all of this is an HTML engine. The more verbose ones were never intended to be used like this.
-On the other hand the limited one that I wrote just for this experiment is... Well, way too limited. It allows me to
-browse simpler websites like my own, but has problems in those that have CSS that's longer than the website content.
-Of course, I don't even mention modern web applications, obviously they won't work without Javascript.
-<p>Surprisingly, despite the enormity of problems mostly related to HTML, CSS or Javascript, I'm staying positive. It
-works, it can be integrated in the environment and it's an interesting idea to explore. For some reason it feels like
-I took <code>xdg-open</code> to extremes (that's why I keep mentioning it), but I think it's just because I am yet to
-polish the concept.
-<p>For now, <a href="https://git.ignore.pl/browse/">the utilities</a> are available publicly. You can use them to try
-out the idea. I've left there one simple example that uses <code>dmenu</code> for opening an URI either from list of
-bookmarks or one entered by hand. Moving base address and some mime type to command line options, should give the
-utilities enough flexibility to use e.g. opener to open local files as well. Then it can be used with <code>lf</code> or
-any file manager of your choice, and you'll have single utility to handle all kinds of openings.
-<p>I'll move now to other ideas that I left without any conclusion. However, I'm looking forward to seeing if this one
-can bring more in the future and most certainly I'll return to it with full focus.
-
-</article>
-<script src="https://stats.ignore.pl/track.js"></script>
diff --git a/plumbing_your_own_browser.html b/plumbing_your_own_browser.html
deleted file mode 100644
index 4f9b999..0000000
--- a/plumbing_your_own_browser.html
+++ /dev/null
@@ -1,99 +0,0 @@
-<!doctype html>
-<html lang="en">
-<meta charset="utf-8">
-<meta name="viewport" content="width=device-width, initial-scale=1">
-<meta name="author" content="aki">
-<meta name="tags" content="web, web browser, linux, shell">
-<link rel="icon" type="image/png" href="cylo.png">
-<link rel="stylesheet" type="text/css" href="style.css">
-
-<title>Plumbing Your Own Browser</title>
-
-<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav>
-
-<article>
-<h1>Plumbing Your Own Browser</h1>
-<p class="subtitle">Published on 2020-08-01 21:38:00+02:00</p>
-<img src="plumbing_your_own_browser-1.png" alt="plumbing">
-<p>In spirit of the previous post about <a href="web_browsers_are_no_more.html">web browsers</a>, how about a little
-experiment? Let's write a simple tool that implements downloading, history management and displaying the content. This
-is intended as a trivial and fun experiment.
-<p>Ideally, I think the architecture would divide into: protocol daemon, navigator, opener and view engines. However,
-even with this setup some of them would have wide responsibilities. I don't really like that, but I leave it to future
-to deal with. Anyway, what do they do?</p>
-<dl>
-	<dt>protocol daemon<dd>Responsible for data acquisition and caching. For instance HTTP protocol daemon.
-	<dt>navigator<dd>The quickest way to explain it: the address bar. It handles history, probably sessions, windows,
-	initial requests to protocol daemon from the user. This one would need some attention to properly integrate it with
-	the environment and make sure that its responsibilities don't go too far.
-	<dt>opener<dd>Not really xdg-open or rifle, but something of this sort. Gets data marked for display from the
-	protocol server and acts as a demux for view engines.
-	<dt>view engine<dd>Your usual browser excluding things that already appeared earlier. It may also be something else,
-	like completely normal image viewer, hyperlinked markdown viewer or even less. Or more like sandboxed application
-	environment that is not a web application.
-</dl>
-<p>Sounds like a complex system, but we can do it easily in a short shell script. I won't bother with view engines, as
-right now, that's rather time consuming to get them work, especially that browsers weren't written with this use case in
-mind. Even those minimal ones can't do. Generally, they would need to communicate with protocol server to retrieve
-secondary data (like stylesheet or images) and communicate with navigator when user clicked some kind of link.
-<p>Anyway, let's start with protocol daemon! Our target is web browser, so we need something to handle HTTP for us. What
-else could we use if not curl? Frankly speaking, just curl could be sufficient to view things:</p>
-<pre>
-$ curl -sL https://ignore.pl/plumbing_your_own_browser.html
-...
-...
-...
-</pre>
-<p>Yeah, if you use st as terminal emulator like I do, then you need to add <code>| less</code> at the end, so that you
-can read it. Honestly, with documents that are written in a way that allows people to read them as plain text, that's
-enough (posts in this websites can be read in plain text).
-<p>However, although it's tempting to not, I'll do more than that. Now that we have a protocol daemon that is not a
-daemon, the next one is the opener. Why not navigator? For now interactive shell will be the navigator. You'll see how.
-<p>It's possible that you already have something that could act as an opener (like rifle from ranger file manager).
-There are plenty of similar programs, including xdg-open. I believe that they could be configured to work nicely in this
-setup, but let's write our own:</p>
-<pre>
-#!/bin/sh
-TMP=$(mktemp -p /dev/shm) &&
-	{ TYPE=$(curl -sLw "%{content_type}\n" $@ -o "$TMP") &&
-		case "$TYPE" in
-			application/pdf) zathura "$TMP";;
-			image/*) sxiv "$TMP";;
-			text/*) less "$TMP";;
-			*) hexdump "$TMP";;
-		esac }
-rm -f "$TMP"
-</pre>
-<p>That's a lot of things to explain! First two, up to <code>case "$TYPE" in</code> are actually protocol daemon. The
-<code>$@</code> is what comes from the navigator. In our case, it's the arguments from the shell that run our command.
-Next up, the case statement is the opener. Based on the output of curl's write-out the script selects program to open
-the temporary file from the web. After that, the file is removed, in other words caching is not supported yet.
-<p>Surprisingly, that's it, hell of a minimal browser. Works nicely with pdf files, images and text formats that are not
-extremely bloated. Possibly with some tinkering around xdg-open and x default applications some hyperlinks between the
-formats could be made (e.g. a pdf links to an external image).
-<p>Now, I could go further and suggest something an option like this:</p>
-<pre>
-application/lua) lua_gui_sandbox "$TMP";;
-</pre>
-<p>I find it interesting and worth looking into. I'll leave it as an open thing to try out.
-<p>The are some more things to consider. For instance, the views should know the base directory the file comes from as
-some hyperlinks are relative. In other words, programs used as views should allow to state base of the address in some
-way:</p>
-<pre>
-{ curl -sLw "%{content_type}\n${url_effective}\n" $@ -o "$TMP" | {
-	read TYPE
-	read URL
-	BASE_URL=$(strip_filename_from_url "$URL") } &&
-		case "$TYPE" in
-			text/html) html_view --base "$BASE_URL" "$TMP";;
-			text/markdown) markdown --base "$BASE_URL" "$TMP";;
-			# ...
-		esac }
-</pre>
-<p>By then, the <code>markdown</code> would know that if the user clicks some hyperlink with a relative path, then it
-should append the base path to it. It could also provide information that matters in e.g. CORS.
-<p>For now, that's it. The ideas are still unrefined, but at least they are moving somewhere. Hopefully, I will get
-myself to write something that could act as a view and respect this concept. My priority should be HTML view but I feel
-like starting with simplified Markdown (one without HTML).
-</article>
-<script src="https://stats.ignore.pl/track.js"></script>
author	Aki <please@ignore.pl>	2021-07-25 19:24:21 +0200
committer	Aki <please@ignore.pl>	2021-07-25 19:24:21 +0200
commit	c0b3870dde1d355de40515376ffd5bc87442e21f (patch)
tree	4743de8f455f13bb737ab46f06978da9a10ecf9d
parent	ad76e9b885c9b9692074cf5b8b880cb79f8a48e0 (diff)
download	ignore.pl-c0b3870dde1d355de40515376ffd5bc87442e21f.zip ignore.pl-c0b3870dde1d355de40515376ffd5bc87442e21f.tar.gz ignore.pl-c0b3870dde1d355de40515376ffd5bc87442e21f.tar.bz2