summaryrefslogtreecommitdiff
path: root/using_pacman_to_manage_emscripten_packages.html
blob: 4e2e733b6016a34a2b8efeb86d5c197db95e8709 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
<!doctype html>
<html lang="en">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="author" content="aki">
<meta name="tags" content="arch linux, pacman, emscripten, packaging, distribution">
<meta name="published-on" content="2022-06-19T17:35:00+02:00">
<link rel="icon" type="image/png" href="favicon.png">
<link rel="stylesheet" href="style.css">

<title>Using pacman to Manage Emscripten Packages</title>

<header>
<nav><a href="https://ignore.pl">ignore.pl</a></nav>
<time>19 June 2022</time>
<h1>Using pacman to Manage Emscripten Packages</h1>
</header>

<article>
<p>C was created for use with Unix. Quite quickly it became one of the most used programming languages of all time.
After some additional years it made its way into Linux kernel and operating system. To this year it is the primary
language that is used to interface with the kernel or to write any sort of utilities. If not directly then through
various bindings.
<p>To allow use of external libraries C has a mechanism for including header files in your own source code. Then during
linking stage compiled implementation of these headers is linked along with your code into final executable (or through
dynamic linker with some additional steps). The configuration of what is visible for including and linking employs use
of several PATH-like variables with some defaults, and sometimes (if you were having a good day and it had to be ruined)
hidden or undocumented behaviour.
<p>Management of the available packages that contain headers and libraries is usually offloaded to the system-wide
package manager. Considering the relation between C and system it's hosted by, it isn't that bad of a choice. Now, it
is not perfect, but with a well maintained upstream and local ecosystems it'll be just right.
<p>Problems may appear when we change or take away one of the parts: operating system, C toolchain, or package manager.
The most prominent examples of when it happens is: Windows, cross-compiling, and porting software between distros. Such
cases, especially the first one, resulted in creation of external package managers e.g.,
<a href="https://vcpkg.io/">vcpkg</a>, <a href="https://conan.io/">Conan</a>. In other cases they pushed people toward
build generators such as <a href="https://cmake.org/">CMake</a>.</p>
<img src="using_pacman_to_manage_emscripten_packages-1.png" alt="emscripten logo">
<p>Recently, I've been playing around with <a href="https://emscripten.org/">Emscripten</a>. I built some things here
and there, and now, you guessed it, I'm trying out different approaches to handling libraries and decided to explore
<a href="https://archlinux.org/pacman/">pacman</a>(1) as means to it. I hope you enjoy this little experiment.
<p>Without going into internals, <b>pacman</b> is a package manager used by <a href="https://archlinux.org/">Arch
Linux</a>, a distribution that describes itself as lightweight, flexible, and simple. It focuses on bleeding-edge
packages. I picked it because I happen to use it on a daily basis.
<p>Packages are distributed in a binary form and come from remote repositories. Package is an archive that contains
files meant for installation and some meta information, all built by
<a href="https://archlinux.org/pacman/makepkg.8.html">makepkg</a>(8). Repository is really just a set of files managed
by a <a href="https://archlinux.org/pacman/repo-add.8.html">repo-add</a>(8).


<h3>Building Sample Package</h3>
<p>I started by creating a sample package that provides <a href="https://www.raylib.com/">raylib</a>. To do that, I
wrote a rather simple <code>PKGBUILD</code> file:

<pre>
pkgname=raylib
pkgver=4.0.0
pkgrel=1
arch=(wasm32)
license=(zlib)
makedepends=(cmake emscripten)
source=("${pkgname}.tar.gz::https://github.com/raysan5/raylib/archive/refs/tags/${pkgver}.tar.gz")
sha256sums=("11f6087dc7bedf9efb3f69c0c872f637e421d914e5ecea99bbe7781f173dc38c")
</pre>

<p>Stop, right now! If you are a seasoned package maintainer or maybe you just cross-compiled enough software, you will
notice that something is not right in here. Yeah, <code>arch</code> is wrong. It's a little bit counter-intuitive, so
take a look at another example
<a href="https://archlinux.org/packages/community/any/aarch64-linux-gnu-glibc/">aarch64-linux-gnu-glibc</a>, GNU C
Library for ARM64 targets:

<pre>
$ asp checkout aarch64-linux-gnu-glibc
$ cd aarch64-linux-gnu-glibc/trunk
$ grep arch= PKGBUILD
arch=(any)
</pre>

<p>This is different for a good reason: none of this is going to be used on the host system. Only the compiler and any
binutils will be used, and they are actually targeted for the architecture of build host: <i>x86_64</i> in this case.
<p>Then why am I specifying <i>wasm32</i> for my package?
<p>Emscripten uses cache directories that contain a copy of sysroot. Host system may contain several caches and each
will have own sysroot. I'm not entirely sure what is the reasoning behind it, but that's how it looks like at the moment
of writing.
<p><b>glibc</b> package specifies <i>any</i> architecture, because it is intended to be installed in
<code>/usr/aarch64-linux-gnu</code> and that's where compiler is expecting to see it. I could technically try to make
my package operate in similar manner and install to <code>/usr/lib/emscripten/system</code> that acts as base for caches
and is provided by emscripten package from Arch Linux repositories. I didn't do that because I wanted installed packages
to be immediately available in my cache. To accomplish that, I decided to use <b>pacman</b> similarly to when you
bootstrap a new system installation, and because the package is technically targeted at <i>wasm32</i> I wrote that in
<code>PKGBUILD</code>.
<p>I think the normal way is also worth exploring. Assuming, that I first figure out how to deal with caches, why
emscripten package does not install to usual <code>/usr/wasm32-emscripten</code>, and how to handle propagation of
packages.</p>
<img src="using_pacman_to_manage_emscripten_packages-2.png" alt="pacman, heh">
<p>Anyway, I went the other way and I had to hack my way through. Let's continue with <code>PKGBUILD</code>:

<pre>
build() {
  cd "${pkgname}-${pkgver}"
  emcmake cmake . -B build \
    -DPLATFORM=Web \
    -DBUILD_EXAMPLES=OFF \
    -DCMAKE_INSTALL_PREFIX=/usr
  cd build
  make
}

package() {
  cd "${pkgname}-${pkgver}/build"
  make DESTDIR="${pkgdir}" install
  cd ..
  install -Dm644 LICENSE "${pkgdir}/usr/share/licenses/${pkgname}/LICENSE"
}
</pre>

<p>I use CMake wrapper from Emscripten tools. The only part that's worth noting is that by default, CMake with
Emscripten would set <code>CMAKE_INSTALL_PREFIX</code> to the path of currently used cache directory. That's not
feasible for staging packages meant for distribution, so I use plain <code>/usr</code> instead. Thing is Emscripten uses
<code>include</code> and <code>lib</code> directories located directly in the sysroot and not <code>/usr</code>, so I
will need to adjust it somehow at later stage. Not now because Raylib uses GNUInstallDirs, which expands <code>/</code>
prefix to <code>/usr</code>.
<p>Package is ready to be build:

<pre>
$ makepkg --printsrcinfo &gt;.SRCINFO
$ CFLAGS='' CARCH=wasm32 makepkg
==> Making package: raylib 4.0.0-1
==> ...
==> Finished making: raylib 4.0.0-1
$ ls *.pkg.tar.zst
raylib-4.0.0-1-wasm32.pkg.tar.zst
</pre>

<p>First off, I unset <code>CFLAGS</code> to avoid default options from <code>/etc/makepkg.conf</code> causing problems.
I also need to set <code>CARCH</code> to inform <b>makepkg</b> that I'm cross-compiling to <i>wasm32</i>.


<h3>Setting up Repository</h3>
<p>Now that I had the package, I needed to "distribute" it. Repositories used by <b>pacman</b> are dead simple. They can
be served over HTTP, FTP, or even local files. The structure for all methods is the same and relies on file system,
paths, and central database file. The whole setup was:

<pre>
$ mkdir -p <u>repo_path</u>/wasm32/core
$ cd <u>repo_path</u>/wasm32/core
$ mv <u>package_path</u>/raylib-4.0.0-1-wasm32.pkg.tar.zst .
$ repo-add core.db.tar.gz *.pkg.tar.zst
</pre>

<p>Yeah, that's it. First create a directory for the repository and move there. Path contains both: architecture and
name of the repository. After that move the built package to the same directory, and finally add it to the database that
has the same name as the repository. Now, it's a matter of making <b>pacman</b> use it.


<h3>Installing the Package</h3>
<p>This section may contain wrong uses of tools for the sake of experimentation. If you are faint-hearted or feel the
need of saying "this is not how to do it" or "this is not how you use it" without elaborating or suggesting another
direction, then it's probably better for you to not continue or have a drink first.
<p>Before doing anything I fixed the directory structure of cache to match one that <b>pacman</b> expects:

<pre>
$ cd <u>cache</u>/sysroot
$ mkdir usr
$ mv include lib bin usr
$ ln -s usr/{include,lib,bin} .
</pre>

<p>Symlinks should make everyone happy for now.
<p>Next step was to create directories used directly by <b>pacman</b>:

<pre>
$ mkdir -p etc/pacman.d/{gnupg,hooks} var/{cache/pacman,lib/pacman,log}
</pre>

<p>And finally first thing that's worth attention - config file located at <code>etc/pacman.conf</code>. The plan was to
use <b>pacman</b> in a bootstrap fashion for the sysroot located in cache directory, so I needed to write that in config
terms:

<pre>
[options]
RootDir = <u>cache</u>/sysroot/
CacheDir = <u>cache</u>/sysroot/var/cache/pacman
HookDir = <u>cache</u>/sysroot/etc/pacman.d/hooks
GPGDir = <u>cache</u>/sysroot/etc/pacman.d/gnupg
Architecture = wasm32
CheckSpace
SigLevel = TrustAll
</pre>

<p>Some directories were automatically re-rooted and some weren't. I simply experimented with <code>-v</code> option to
see what is used and adjusted config until I ended up with this version. I don't need to mention <code>TrustAll</code>.
Don't do it.
<p>That's not all; repositories also reside in the config file:

<pre>
[core]
Server = file:///<u>repo_path</u>/$arch/$repo
</pre>

<p>What's left is to sync database and install package. <b>pacman</b> assumes that it needs to be run as root user, but
because I'm working with a user-owned cache as my root directory I'd prefer to not raise its privileges, especially
considering that misconfiguration could break packages in host system. Let's try it out:

<pre>
$ fakeroot pacman --config <u>cache</u>/sysroot/etc/pacman.conf -Sy
:: Synchronising package databases...
 core   418.0   B   408 KiB/s 00:00 [###################################] 100%
$ fakeroot pacman --config <u>cache</u>/sysroot/etc/pacman.conf -S raylib
resolving dependencies...
looking for conflicting packages...

Packages (1) raylib-4.0.0-1

Total Download Size:   2.04 MiB
Total Installed Size:  4.70 MiB

:: Proceed with installation? [Y/n]
:: ...
(1/1) installing raylib             [###################################] 100%
</pre>

<p>Looks like the installation process succeeded. Time to try it out.


<h3>Trying It Out and Adjusting pkg-config</h3>
<p>Turns out it doesn't work just yet. Some samples would work but not this one.
<p><b>raylib</b> CMake module has a very peculiar way of defining its target. At first it asks <b>pkg-config</b> for
hints and then uses them in a slightly inconsistent way. Long story short, CMake target will have linker options set
based on output from <code>pkg-config --libs --static</code> disregarding any attempts to remove <code>-L</code>
options.
<p>Since I built my package with <code>CMAKE_INSTALL_PREFIX</code> set to <code>/usr</code>, the <code>prefix</code>
variable in installed <code>raylib.pc</code> will be set to <code>/usr</code>. This will result in
<code>-L/usr/lib</code> appearing in public linker options for raylib target, which will break the entire build process.
<p>The problem here is the <code>prefix=/usr</code> in the module definition file. It should point to the actual root
which is located in cache directory.
<p>There are several ways to address it. My favourite was to simply rewrite the prefix as part of install hook that
would be run by <b>pacman</b>. Sadly it failed because hooks are run in chroot. There are ways to fake it, but I didn't
find them worth exploring at that moment. The other way was <code>PKG_CONFIG_SYSROOT_DIR</code>, and that's what I did.
I tried to avoid it due to uncertain situation between <b>pkgconf</b> and <b>pkg-config</b>.
<p>Luckily, it turned out good enough for me to wrap up the whole experiment. I patched <code>Emscripten.cmake</code>
toolchain file and was able to build a sample project that used the installed sample package.
<p>Should I show here something? Nah</p>
<img src="using_pacman_to_manage_emscripten_packages-3.png" alt="tool... chain?">


<h3>Final Notes</h3>
<p>This was a fun experiment. For some reason I really enjoyed that <b>fakeroot</b> use.
<p>Package management or rather dependency management in cross-compilation context sounds like a good next direction to
explore. I found various takes on it. GNU is a little bit more standardized and there are projects like
<a href="https://crosstool-ng.github.io/">crosstool-NG</a> that at the very least ease configuration of toolchains.
I couldn't find many examples of installable binary packages for target with the exception of the standard library.
Instead, it seems that the usual approach is compiling ports by yourself (which is fine) from e.g., incredibly complex
CMake trees (which is fine, but with flames in background). Otherwise, using <b>vcpkg</b> or similar manager. Or doing
something wild.
<p>As for anything else worth noting... I hope I pointed out everything that I wanted in the article itself. If not,
well, it happens.
</article>
<script src="https://stats.ignore.pl/track.js"></script>