Using pacman to Manage Emscripten Packages
Published on 2022-06-19 17:35:00+02:00
C was created for use with Unix. Quite quickly it became one of the most used programming languages of all time.
After some additional years it made its way into Linux kernel and operating system. To this year it is the primary
language that is used to interface with the kernel or to write any sort of utilities. If not directly then through
various bindings.
To allow use of external libraries C has a mechanism for including header files in your own source code. Then during
linking stage compiled implementation of these headers is linked along with your code into final executable (or through
dynamic linker with some additional steps). The configuration of what is visible for including and linking employs use
of several PATH-like variables with some defaults, and sometimes (if you were having a good day and it had to be ruined)
hidden or undocumented behaviour.
Management of the available packages that contain headers and libraries is usually offloaded to the system-wide
package manager. Considering the relation between C and system it's hosted by, it isn't that bad of a choice. Now, it
is not perfect, but with a well maintained upstream and local ecosystems it'll be just right.
Problems may appear when we change or take away one of the parts: operating system, C toolchain, or package manager.
The most prominent examples of when it happens is: Windows, cross-compiling, and porting software between distros. Such
cases, especially the first one, resulted in creation of external package managers e.g.,
vcpkg, Conan. In other cases they pushed people toward
build generators such as CMake.
Recently, I've been playing around with Emscripten. I built some things here
and there, and now, you guessed it, I'm trying out different approaches to handling libraries and decided to explore
pacman(1) as means to it. I hope you enjoy this little experiment.
Without going into internals, pacman is a package manager used by Arch
Linux, a distribution that describes itself as lightweight, flexible, and simple. It focuses on bleeding-edge
packages. I picked it because I happen to use it on a daily basis.
Packages are distributed in a binary form and come from remote repositories. Package is an archive that contains
files meant for installation and some meta information, all built by
makepkg(8). Repository is really just a set of files managed
by a repo-add(8).
Building Sample Package
I started by creating a sample package that provides raylib. To do that, I
wrote a rather simple PKGBUILD
file:
pkgname=raylib
pkgver=4.0.0
pkgrel=1
arch=(wasm32)
license=(zlib)
makedepends=(cmake emscripten)
source=("${pkgname}.tar.gz::https://github.com/raysan5/raylib/archive/refs/tags/${pkgver}.tar.gz")
sha256sums=("11f6087dc7bedf9efb3f69c0c872f637e421d914e5ecea99bbe7781f173dc38c")
Stop, right now! If you are a seasoned package maintainer or maybe you just cross-compiled enough software, you will
notice that something is not right in here. Yeah, arch
is wrong. It's a little bit counter-intuitive, so
take a look at another example
aarch64-linux-gnu-glibc, GNU C
Library for ARM64 targets:
$ asp checkout aarch64-linux-gnu-glibc
$ cd aarch64-linux-gnu-glibc/trunk
$ grep arch= PKGBUILD
arch=(any)
This is different for a good reason: none of this is going to be used on the host system. Only the compiler and any
binutils will be used, and they are actually targeted for the architecture of build host: x86_64 in this case.
Then why am I specifying wasm32 for my package?
Emscripten uses cache directories that contain a copy of sysroot. Host system may contain several caches and each
will have own sysroot. I'm not entirely sure what is the reasoning behind it, but that's how it looks like at the moment
of writing.
glibc package specifies any architecture, because it is intended to be installed in
/usr/aarch64-linux-gnu
and that's where compiler is expecting to see it. I could technically try to make
my package operate in similar manner and install to /usr/lib/emscripten/system
that acts as base for caches
and is provided by emscripten package from Arch Linux repositories. I didn't do that because I wanted installed packages
to be immediately available in my cache. To accomplish that, I decided to use pacman similarly to when you
bootstrap a new system installation, and because the package is technically targeted at wasm32 I wrote that in
PKGBUILD
.
I think the normal way is also worth exploring. Assuming, that I first figure out how to deal with caches, why
emscripten package does not install to usual /usr/wasm32-emscripten
, and how to handle propagation of
packages.
Anyway, I went the other way and I had to hack my way through. Let's continue with PKGBUILD
:
build() {
cd "${pkgname}-${pkgver}"
emcmake cmake . -B build \
-DPLATFORM=Web \
-DBUILD_EXAMPLES=OFF \
-DCMAKE_INSTALL_PREFIX=/usr
cd build
make
}
package() {
cd "${pkgname}-${pkgver}/build"
make DESTDIR="${pkgdir}" install
cd ..
install -Dm644 LICENSE "${pkgdir}/usr/share/licenses/${pkgname}/LICENSE"
}
I use CMake wrapper from Emscripten tools. The only part that's worth noting is that by default, CMake with
Emscripten would set CMAKE_INSTALL_PREFIX
to the path of currently used cache directory. That's not
feasible for staging packages meant for distribution, so I use plain /usr
instead. Thing is Emscripten uses
include
and lib
directories located directly in the sysroot and not /usr
, so I
will need to adjust it somehow at later stage. Not now because Raylib uses GNUInstallDirs, which expands /
prefix to /usr
.
Package is ready to be build:
$ makepkg --printsrcinfo >.SRCINFO
$ CFLAGS='' CARCH=wasm32 makepkg
==> Making package: raylib 4.0.0-1
==> ...
==> Finished making: raylib 4.0.0-1
$ ls *.pkg.tar.zst
raylib-4.0.0-1-wasm32.pkg.tar.zst
First off, I unset CFLAGS
to avoid default options from /etc/makepkg.conf
causing problems.
I also need to set CARCH
to inform makepkg that I'm cross-compiling to wasm32.
Setting up Repository
Now that I had the package, I needed to "distribute" it. Repositories used by pacman are dead simple. They can
be served over HTTP, FTP, or even local files. The structure for all methods is the same and relies on file system,
paths, and central database file. The whole setup was:
$ mkdir -p repo_path/wasm32/core
$ cd repo_path/wasm32/core
$ mv package_path/raylib-4.0.0-1-wasm32.pkg.tar.zst .
$ repo-add core.db.tar.gz *.pkg.tar.zst
Yeah, that's it. First create a directory for the repository and move there. Path contains both: architecture and
name of the repository. After that move the built package to the same directory, and finally add it to the database that
has the same name as the repository. Now, it's a matter of making pacman use it.
Installing the Package
This section may contain wrong uses of tools for the sake of experimentation. If you are faint-hearted or feel the
need of saying "this is not how to do it" or "this is not how you use it" without elaborating or suggesting another
direction, then it's probably better for you to not continue or have a drink first.
Before doing anything I fixed the directory structure of cache to match one that pacman expects:
$ cd cache/sysroot
$ mkdir usr
$ mv include lib bin usr
$ ln -s usr/{include,lib,bin} .
Symlinks should make everyone happy for now.
Next step was to create directories used directly by pacman:
$ mkdir -p etc/pacman.d/{gnupg,hooks} var/{cache/pacman,lib/pacman,log}
And finally first thing that's worth attention - config file located at etc/pacman.conf
. The plan was to
use pacman in a bootstrap fashion for the sysroot located in cache directory, so I needed to write that in config
terms:
[options]
RootDir = cache/sysroot/
CacheDir = cache/sysroot/var/cache/pacman
HookDir = cache/sysroot/etc/pacman.d/hooks
GPGDir = cache/sysroot/etc/pacman.d/gnupg
Architecture = wasm32
CheckSpace
SigLevel = TrustAll
Some directories were automatically re-rooted and some weren't. I simply experimented with -v
option to
see what is used and adjusted config until I ended up with this version. I don't need to mention TrustAll
.
Don't do it.
That's not all; repositories also reside in the config file:
[core]
Server = file:///repo_path/$arch/$repo
What's left is to sync database and install package. pacman assumes that it needs to be run as root user, but
because I'm working with a user-owned cache as my root directory I'd prefer to not raise its privileges, especially
considering that misconfiguration could break packages in host system. Let's try it out:
$ fakeroot pacman --config cache/sysroot/etc/pacman.conf -Sy
:: Synchronising package databases...
core 418.0 B 408 KiB/s 00:00 [###################################] 100%
$ fakeroot pacman --config cache/sysroot/etc/pacman.conf -S raylib
resolving dependencies...
looking for conflicting packages...
Packages (1) raylib-4.0.0-1
Total Download Size: 2.04 MiB
Total Installed Size: 4.70 MiB
:: Proceed with installation? [Y/n]
:: ...
(1/1) installing raylib [###################################] 100%
Looks like the installation process succeeded. Time to try it out.
Trying It Out and Adjusting pkg-config
Turns out it doesn't work just yet. Some samples would work but not this one.
raylib CMake module has a very peculiar way of defining its target. At first it asks pkg-config for
hints and then uses them in a slightly inconsistent way. Long story short, CMake target will have linker options set
based on output from pkg-config --libs --static
disregarding any attempts to remove -L
options.
Since I built my package with CMAKE_INSTALL_PREFIX
set to /usr
, the prefix
variable in installed raylib.pc
will be set to /usr
. This will result in
-L/usr/lib
appearing in public linker options for raylib target, which will break the entire build process.
The problem here is the prefix=/usr
in the module definition file. It should point to the actual root
which is located in cache directory.
There are several ways to address it. My favourite was to simply rewrite the prefix as part of install hook that
would be run by pacman. Sadly it failed because hooks are run in chroot. There are ways to fake it, but I didn't
find them worth exploring at that moment. The other way was PKG_CONFIG_SYSROOT_DIR
, and that's what I did.
I tried to avoid it due to uncertain situation between pkgconf and pkg-config.
Luckily, it turned out good enough for me to wrap up the whole experiment. I patched Emscripten.cmake
toolchain file and was able to build a sample project that used the installed sample package.
Should I show here something? Nah
Final Notes
This was a fun experiment. For some reason I really enjoyed that fakeroot use.
Package management or rather dependency management in cross-compilation context sounds like a good next direction to
explore. I found various takes on it. GNU is a little bit more standardized and there are projects like
crosstool-NG that at the very least ease configuration of toolchains.
I couldn't find many examples of installable binary packages for target with the exception of the standard library.
Instead, it seems that the usual approach is compiling ports by yourself (which is fine) from e.g., incredibly complex
CMake trees (which is fine, but with flames in background). Otherwise, using vcpkg or similar manager. Or doing
something wild.
As for anything else worth noting... I hope I pointed out everything that I wanted in the article itself. If not,
well, it happens.