plumbing_your_own_browser.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99

<!doctype html>
<html lang="en">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="author" content="aki">
<meta name="tags" content="web, web browser, linux, shell">
<link rel="icon" type="image/png" href="cylo.png">
<link rel="stylesheet" type="text/css" href="style.css">

<title>Plumbing Your Own Browser</title>

<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav>

<article>
<h1>Plumbing Your Own Browser</h1>
<p class="subtitle">Published on 2020-08-01 21:38:00+02:00</p>
<img src="plumbing_your_own_browser-1.png" alt="plumbing">
<p>In spirit of the previous post about <a href="web_browsers_are_no_more.html">web browsers</a>, how about a little
experiment? Let's write a simple tool that implements downloading, history management and displaying the content. This
is intended as a trivial and fun experiment.
<p>Ideally, I think the architecture would divide into: protocol daemon, navigator, opener and view engines. However,
even with this setup some of them would have wide responsibilities. I don't really like that, but I leave it to future
to deal with. Anyway, what do they do?</p>
<dl>
	<dt>protocol daemon<dd>Responsible for data acquisition and caching. For instance HTTP protocol daemon.
	<dt>navigator<dd>The quickest way to explain it: the address bar. It handles history, probably sessions, windows,
	initial requests to protocol daemon from the user. This one would need some attention to properly integrate it with
	the environment and make sure that its responsibilities don't go too far.
	<dt>opener<dd>Not really xdg-open or rifle, but something of this sort. Gets data marked for display from the
	protocol server and acts as a demux for view engines.
	<dt>view engine<dd>Your usual browser excluding things that already appeared earlier. It may also be something else,
	like completely normal image viewer, hyperlinked markdown viewer or even less. Or more like sandboxed application
	environment that is not a web application.
</dl>
<p>Sounds like a complex system, but we can do it easily in a short shell script. I won't bother with view engines, as
right now, that's rather time consuming to get them work, especially that browsers weren't written with this use case in
mind. Even those minimal ones can't do. Generally, they would need to communicate with protocol server to retrieve
secondary data (like stylesheet or images) and communicate with navigator when user clicked some kind of link.
<p>Anyway, let's start with protocol daemon! Our target is web browser, so we need something to handle HTTP for us. What
else could we use if not curl? Frankly speaking, just curl could be sufficient to view things:</p>
<pre>
$ curl -sL https://ignore.pl/plumbing_your_own_browser.html
...
...
...
</pre>
<p>Yeah, if you use st as terminal emulator like I do, then you need to add <code>| less</code> at the end, so that you
can read it. Honestly, with documents that are written in a way that allows people to read them as plain text, that's
enough (posts in this websites can be read in plain text).
<p>However, although it's tempting to not, I'll do more than that. Now that we have a protocol daemon that is not a
daemon, the next one is the opener. Why not navigator? For now interactive shell will be the navigator. You'll see how.
<p>It's possible that you already have something that could act as an opener (like rifle from ranger file manager).
There are plenty of similar programs, including xdg-open. I believe that they could be configured to work nicely in this
setup, but let's write our own:</p>
<pre>
#!/bin/sh
TMP=$(mktemp -p /dev/shm) &&
	{ TYPE=$(curl -sLw "%{content_type}\n" $@ -o "$TMP") &&
		case "$TYPE" in
			application/pdf) zathura "$TMP";;
			image/*) sxiv "$TMP";;
			text/*) less "$TMP";;
			*) hexdump "$TMP";;
		esac }
rm -f "$TMP"
</pre>
<p>That's a lot of things to explain! First two, up to <code>case "$TYPE" in</code> are actually protocol daemon. The
<code>$@</code> is what comes from the navigator. In our case, it's the arguments from the shell that run our command.
Next up, the case statement is the opener. Based on the output of curl's write-out the script selects program to open
the temporary file from the web. After that, the file is removed, in other words caching is not supported yet.
<p>Surprisingly, that's it, hell of a minimal browser. Works nicely with pdf files, images and text formats that are not
extremely bloated. Possibly with some tinkering around xdg-open and x default applications some hyperlinks between the
formats could be made (e.g. a pdf links to an external image).
<p>Now, I could go further and suggest something an option like this:</p>
<pre>
application/lua) lua_gui_sandbox "$TMP";;
</pre>
<p>I find it interesting and worth looking into. I'll leave it as an open thing to try out.
<p>The are some more things to consider. For instance, the views should know the base directory the file comes from as
some hyperlinks are relative. In other words, programs used as views should allow to state base of the address in some
way:</p>
<pre>
{ curl -sLw "%{content_type}\n${url_effective}\n" $@ -o "$TMP" | {
	read TYPE
	read URL
	BASE_URL=$(strip_filename_from_url "$URL") } &&
		case "$TYPE" in
			text/html) html_view --base "$BASE_URL" "$TMP";;
			text/markdown) markdown --base "$BASE_URL" "$TMP";;
			# ...
		esac }
</pre>
<p>By then, the <code>markdown</code> would know that if the user clicks some hyperlink with a relative path, then it
should append the base path to it. It could also provide information that matters in e.g. CORS.
<p>For now, that's it. The ideas are still unrefined, but at least they are moving somewhere. Hopefully, I will get
myself to write something that could act as a view and respect this concept. My priority should be HTML view but I feel
like starting with simplified Markdown (one without HTML).
</article>
<script src="https://stats.ignore.pl/track.js"></script>